Paper SMART Data Visualization and Exploration en 1_1

7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

1/14

SMART Data Visualization and Exploration

Filipe Clrigo, Ricardo Raminhos, Rui Estevo

VIATECLA SA

[email protected],[email protected],[email protected]

Teresa Gonalves, Pedro Melgueira

Universidade de vora

[email protected],[email protected]

Summary

The continuous growth on the volume of information/data does not mean a proportional increase on its

related knowledge. Even, in some cases the actual increase of information contributes to a decline onthe quality of that knowledge. The existence of automatic analysis and visual inspection mechanisms

(normally under a supervised format), represent an important added value especially when these

mechanisms are naturally integrated in repositories that are specialized in managing big volumes of

content (i.e. CMSContent Management Systems).

As some of these repositories are open, they allow a high level of flexibility to the organisations that use

them, since it is possible to freely model their business data structures. However, it also means they are

not restricted to a certain domain of specific information which brings a great challenge on the way data

is interpreted and visually presented, as its structure is not known beforehand.

This is the main purpose of the SMART Content Provider prototype. The current paper considers and

presents the results obtained in its visual and exploration data areas, applied to open repositories of

information.

The SMART Content Provider CP) Project

Through the Smart CP [1] project, investigation on enhancing Intelligence on CMS environments was

performed under three main pillars:

(i) Enhance mechanisms of aggregation of heterogeneous information (where the structures and

objects are not known beforehand),

(ii)

Define and apply Artificial Intelligence Algorithms, in particular in the area of the detection ofpatterns on semi-structured information,

(iii) Apply mechanisms of data presentation to results/contents, exploring non-conventional

formats and ways of information representation that contribute to a more fluid knowledge

exploration.

The knowledge resulting from this investigation has been materialized in a prototype for a generic

platform for data visualization and interaction, referred as SMART Content Provider (CP), a project

developed by VIATECLA [2], supported by Universidade de vora [3] and GTE Consultores [4], and co-

financed by QREN (Quadro de Referncia Estratgico Nacional) [5].

The present paper focuses only on the third element of the project related to the presentation andexploration of information components. A general presentation of the project, in terms of its objectives,
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]


2/14

architecture and results, can be found on the paper SMART Content Provider [6], whilst the detailed

presentation of the application of AI Algorithms is available on the paper Data Clustering for

heterogeneous data [7].

Architecture

Figure 1 shows a global vision for the SMART CP architectural platform. A three colour scheme is used tocharacterize its functional blocks that compose the platform or external interactions:

Orange: completely external to the platform, with which the SMART CP platform interacts to obtain

data / contents,

Green: Functional blocks with which the SMART CP platform is integrated, i.e. regarding the native

content management system that supports the platform;

Purple: Native blocks from the SMART CP platform.

Figure 1: General diagram of the architecture of the platform SMART CP

The architecture for the SMART CP platform follows a classic client/server paradigm, as presented in

Figure 1.Blocks regarding the server component are represented on the top of the image, and blocks

relating with client components on the bottom. Because Smart CP platform uses data/contents present

in content management systems, all client functional groups (i.e. data sorting, data visuals and

exploration, accountability and workflows) are integrated in the content management system backoffice

itself.

SMART

Aggregation

Scriptor Server Core(External Content Manager)

SMART Data Layer

SMART Import REST APIJSON Data

Formatter

MS Excel

(External)

MS SQL Database

(External)

Third Party External

Content Manager

Scriptor Server Backoffice

(External Content Manager)

SMART Analyser

Data Sorting

SMART Views SMART Elastic SMART Magic Board SMART TimelineSMART NavigationSMART Graphs SMART State

Data Visuals and Exploration Accountability Workflows

Scriptor Server API

Server Layer

Client Layer


3/14

State of the Art

Several interesting visual approaches are emerging that allow visualization actions, content

manipulation and exploration in line with SMART CP representation objectives. Some of these

approaches shall be analysed next. Although some of these visual representation theories are purely

conceptual they can be easily adapted to business analytics and clustering contexts, both primary lines

of research for the SMART CP project.

Figure 2: Examples of data visualization and exploration approaches

Figure 2 (top left), presents a visualization method that allows a hierarchical notion applied to different

classes of data [8, 9], enabling also a balanced perception for those data classes on each level,

simultaneously. On this example, the data is presented at two levels only (internal and external).

However, further multi-levels can be applied progressively without the diagram becoming excessively

confusing.

The representation on the top centre shows a simple, but interesting mapping on the number of event

incidences for each variable, on a representation in the form of an area [10, 11]. This way it is possible to

observe which are the dominant variables, and most important, relate the order of magnitude between

them [12, 13].

On the top right, the figure shows what is known as a constellation[14, 15]. This concept is used on

the representation of connections between data as it is the case of graphs that can be presented using

several shapes and colours, with three-dimensional effects or on a plan. The possible variation on the

node format of the constellation can have some extra information, which will distinguish nodes between

them, with colour, size or format changes so it is possible to place a great quantity of information on the

constellation, without it being excessively confusing, being also possible to represent and highlight the

presence of clusters on the represented data.

On the bottom left, a diagram in the form of a circle is present [16, 17]. Outside the circle, and around it,

are the objects to be analysed while on the inside the relations between them are shown. Some visual


4/14

constructs can vary in order to help the comprehension and differentiation of the data. As seen in the

example, the most important connections are visible through the thickness of connecting lines.

Representations on the centre and bottom right present a simple object distribution matrix [18, 19]. On

the first, unorganised raw data with a high level of entropy is shown. On the second one, the data has

been reorganised through clustering techniques and then grouped according to its degree of similarity.

This way, a diagram where it is easy to detect and observe groups of data that were previously scatteredand of difficult identification is obtained.

Some of these visual concepts have been applied to SMART CP visualization components, as it is

presented in the following sections.

SMART components for data exploration and visualization

Regarding the client layer, as mentioned before, all contents developed are integrated in the online CMS

platform backoffice. In a conceptual point of view, the seven components are grouped in four main

areas:

1.

Data Sorting and Filtering;

2.

Data Visuals and Exploration;

3. Accountability;

4. Business Workflows.

The Data Sorting area is materialized through the SMART Views component. This component allows

content sorting and filtering operations in an intelligent way, being completely generic (i.e. by not

knowing the content structure beforehand). These views can be defined privately or can be made

public. The contents processed by the SMART Views component can be directly viewed, listed in a

simple way, or the results can be later used as a source of data for other visual components (e.g. SMART

Graphs, SMART Elastic).

Regarding the Data Visuals and Exploration area, the following components are present: SMART

Elastic, SMART Magic Board, SMART Graphsand SMART Navigation. The first two components will be

presented in detail in the following section.

The SMART Navigationcomponent relates to the presentation of metrics and the possible actions to

perform over an aggregated set of contents, in a dashboard/control panel logic. Through graphics,

listings as well as metrics associated with different colour levels, it is possible to identify possible limit

situations that require further attention from the manager/administrator, in a graphical way.

The SMART Graphscomponent presents a relatively standard set of charts, where the user can verify

the spread of results taken from the selected sample. Although this component is not completely

innovative by itself, its graphical and information exploitation aspect has been very important to

implement, as it provides relevant information, mainly to users that are not so experienced/keen in

content exploration processes.

The Accountability area is represented by the SMART Timeline component that will be presented in

detail in the following section.

Finally, the Workflowsarea is represented through the component SMART State. The Scriptor Server

platform has an internal workflow engine, which was enriched with this SMART CP graphical component

allowing the generic creation of workflows/business flows. Due to this graphical generic component

(with minimal technical complexity) administrators can build business workflows specific to their

domain, not being restricted to pre-designed business workflows.


5/14

SMART Magic Board

The Magic Board graphic component allows to represent and explore contents of multiple dimensions

simultaneously, through its representation on a 2D plan, with which one or more attributes are shown

on the horizontal axis and one or more attributes are shown on the vertical axis, as well as mapping

capabilities of attribute values in the form of colour, shape and/or size.

As other visual components, this is integrated within the backoffice of the content manager. However,and due to space/visibility requirements for the exploring area, it is possible to use this component in a

full screen mode.

This component uses data previously aggregated by the platform (within its SMART Aggregation server

component), in order to allow a quick presentation of results, with all computation and aggregation

processes carried out at the time data is entered and updated at the CMS (versus being computed on

request).

Initially, the user defines which attribute dimensions wants to explore (Figure 3 on the left). For

example, for a given object/content that represents an issue of a ticketing tool (e.g. a clarification

request, amendment or bug report), by selecting and dragging the attribute Environment to thehorizontal axis of representation, Figure 3 on the right is obtained as a result. On this screen an only for

the values to which the attribute has results, contents are presented, randomly on the available space,

where each content is mapped to a circular icon.

Note that on this representation, the main concern, at least on a first moment involves understanding

how the set of contents is spread globally and not the analysis on the content itself. However, it is

possible to access the information about one specific content at any time, by clicking on its icon visual

representation (where a message with the name/title of the content is shown). In case the user clicks a

second time on the content icon a window previewing the content data is open.

Figure 3: Data initialization on the Magic Board

It is equally possible to select further attributes and place them on the vertical axis. By having an

attribute previously defined on the horizontal axis, this will result in a two axis plan (Figure 4 on the

left). As a result, in the given example, it is easy to analyse that the great majority of the issues

included in the CMS platform relate to tasks and are associated to the development environment.

The existence of requests associated to the pre-production and production environments is

relatively low in comparison to the other environments. On the other side, issues of the type

request and bug can only be found on the developmentenvironment.


6/14

It is possible to do finer partitions on the horizontal and/or vertical axis, adding attributes to a second

level. The screen on theFigure 4 (on the right) shows the addition of the secondary attribute Assigned

To to the horizontal axis. We can note that this attribute also contributes to a very low spread rate to

results, as there are few results that are assigned to the design team. In fact, mostresults are assigned

to VIATECLA.

Figure 4: Attribute addition on a 2D plan (with a unique dimension on the horizontal axisleft | with two dimensions on thehorizontal axisright)

Figure 5 shows a similar example of two attributes cross-checking on the horizontal and vertical axis,

reflecting a real example for a VIATECLAs client that participated on the prototype validation. In this

case, the contents with the attributes Area (horizontal axis) and Assigned To (vertical axis) have

been cross-checkedi.e. requests assigned to VIATECLA employees by project area.

Figure 5: Graphic representation of contents on the horizontal dimension rea and vertical dimension Assigned Person"


7/14

Note that in addition to the possibility of expressing attribute dimensions through their representation

on the horizontal and vertical axis, it is also possible to represent them through the use of colour, shape

and size. This specification can be performed in the Visuals area of the component. Regarding the

example of representation through colour, and by selecting an attribute of the enumerated type, one

option for the mapping of those values into a colour range is presented (Figure 6). A similar approach is

carried out for the mapping by a range of shapes (Figure 7on the left) and dimensions (Figure 7on

the right).

Figure 6: Representation of the data dimension through a colour range

Figure 7: Representation of the data dimensions through shape and size ranges

Also present on the configuration section of the SMART Magic Board, there is a filter area that allows to

limit the universe of contents (regardless of the shape and format that contents would be represented).


8/14

With similarities to the configuration of the Visuals area, this field allows to select one attribute and

define which values should be (or not) be considered as a filter.

Finally, and when we want to add more than two attributes on the horizontal or vertical axis, or in case

the two attributes selected have a low spread level, which would result in a very high combinatory, it is

possible to do a drill-down on a specific quadrant of the specific universe, selected by the user. Thus,

considering the example on theFigure 8 (on the left), in case the user selects the quadrant on the topleft corner a drill-down is done, and all universe of results now becomes the one of the quadrant

selected, on the lower level (Figure 8 on the right). The attributes previously selected (i.e.

Environment and Type) are fixed, and the user can drag other attributes to the horizontal and vertical

axis, in order to decompose/explore even more information present at this level.

Figure 8: Drill-down mechanism application to the data universe

SMART Elastic

The SMART Elastic component allows the creation of dynamic filters that can be applied to contents

where their structure is not previously known, as well as determining the form that the contents should

appear on screen as a result (i.e. which fields of information).

For the configuration itself (Figure 9), the user is questioned about the attribute fields of the object that

should be used as a dynamic/elastic filter, which attribute fields that should appear on the form that

will represent a content result, and what field should be used on the sorting/serialization of contents.

For this example the selection of the attributes Status, Project, Area, Environment, Priority,

and Assigned Persons as filtering fields, and the attributeTitle a detail field, results inFigure 10.

The expression Elastic associated to the component arises from the fact that when a value in one of

the filtering dimensions is selected, the values of the remaining filters and the contents filtered are

recalculated in an elastic way; for example on the values of the remaining filters, values that may

further filter contents are kept, and all other values to which there are no contents with associated

value are removed.

In this way, the creation of Boolean filtering rules through AND operators is possible, by selecting

different contents in different filter columns, and rules with OR operator when more than one attributevalue is selected on the same filter column.


9/14

Figure 9: Initial screen for the dimension specification for the SMART Elastic component

Figure 10: Initial result presentation with no filter configuration

Since this component has a strong tendency for the exploitation of results, mainly on trial and error, itis essential that the response times are very quick. With this in mind, the data applied by the component

for presentation do not represent the actual raw contents, but a set of indexations and pre -


10/14

aggregated values that will be made available by the SMART Aggregation server component. Thus the

effort on computation and aggregation is carried out incrementally at the time contents are created and

updated, and at the moment of visualization data will only be presented, as it has already been

processed.

As an example, the change of the filtering rule Priority(with the value Major) AND Project(with value

K4T Mobile Apps) expressed for Priority (with value Major) AND Workflow Status (with valueNotAnIssue) Figure 11, is carried out with two interactions (removal of the filter Project, and

addition of the filter Workflow Status) reducing the number of results obtained from 536 to 6 within a

time of 1 to 2 seconds.

Figure 11: Result presentation upon dimension filtering

SMART Timeline

The SMART Timeline component is specialized on the representation of contents on a time axis.

However, it follows a different approach, by not focusing on the representation of the content according

to one of its date type attributes, but by considering the dates when content changes have occurred

(either at their attribute/field levels, or through workflow changes).

The component addresses a very important question regarding accountability on content handling that

sometimes is minimized at the level of the content manager, or by presenting a vision of the

information that is too technical (e.g. on the form of log files). This way it is possible to inspect change

of which contents have been amended or had changes of status / workflow, by presenting information

related to the moment before the changes, and after those changes, in a graphic form, where it is

possible to easily identify whomade whichchange and when, on the temporal axis.

OnFigure 12 (on the left) a graphic presentation of the timeline is shown. By default, and not having any

option selected on the component, it delivers a representation where time events are distributed evenly

on the timeline. On the top of the screen (left side) it is possible to view the start and end dates of theamendments and the period of time between those dates (this indicator can be of great importance in

the case of Service Level Agreement contract clauses).


11/14


12/14

Figure 14: Graphic representation of time occurrences (highlight of actions by the same user)

Figure 15: Inspection of changes made to a certain content

Evaluation and Future work

The correction for the SMART CP prototype can only be assessed by its effective use. During the final

stages of its development, a pilot has been made available so that the platform could be refined

according to feedback collection.

As for the scope of the test pilot, a ticketing system, already implemented at VIATECLA and named One

system was selected (communication between client and supplier). The One System is a collaborative

tool aiming at improving productivity, where the client and the development team issue and generateissues with the capacity of giving the adequate follow-up to each situation, that could be (or not)


13/14

critical to the business, within the context of the clients project. Through this solution the

communication with the development, operation and project teams is simplified.

Clients have at their own disposal the facilities that allow them to communicate in real time about

technical questions, requests for clarification about the use of functionalities operated by all VIATECLA

platforms, or even address general comments.

The result of putting this test pilot into place has been a major success, as it has exceeded all initialexpectations. According to feedback received from the technical teams (not involved with the initial

project) about the platform usage and from VIATECLAsclients that have participated in an informal way

on the project validation were willing to keep using this environment in a more operational way after

the project validation was finished. This constitutes a recognition of the added value the SMART CP

platform brings.

As the test pilot focus is on the administration of high volume of contents (e.g. issues) and with

different levels of priorities, the components SMART Views, SMART Elastic, SMART Magic Board,

SMART Navigation and SMART Timeline, were the ones that got the most positive feedback,

because the impact they had on the information management, by turning it more visual andcomprehensive, allowing users to explore it through the drill-down tools and multi-filter criteria.

Future work of SMART CP includes the definition of strategies for launching the platform on the market,

aiming at getting more and better feedback for the improvement and innovation of the work carried out

with the effective use of the platform.

References

[1] Microsite SMART CP. 2015, http://www.viatecla.com/inovacao/smart_content_provider[2] VIATECLA, Institucional website. 2015, http://www.viatecla.com

[3] University of vora, Institucional website. 2015, http://www.uevora.pt/

[4] GTE, Institucional website. 2015, http://www.gte.pt/

[5] National Strategic Reference Framework (NSRF), Institucional website. 2015,

http://www.qren.pt/np4/home

[6] Clrigo, Filipe. Raminhos, Ricardo. Estevo, Rui. Gonalves, Teresa. Melgueira, Pedro.: SMART

Content Provider, 2015

[7] Gonalves, Teresa. Melgueira, Pedro. Clrigo, Filipe. Raminhos, Ricardo. Estevo, Rui.: Data

Clustering for heterogeneous data, 2015

[8] Draper, G.; Livnat, Y.; Riesenfeld, R.F., "A Survey of Radial Methods for Information Visualization," in

Visualization and Computer Graphics, IEEE Transactions on , vol.15, no.5, pp.759-776, Sept.-Oct. 2009,

doi: 10.1109/TVCG.2009.23

[9] Diehl, S.; Beck, F.; Burch, M., "Uncovering Strengths and Weaknesses of Radial Visualizations---an

Empirical Approach," in Visualization and Computer Graphics, IEEE Transactions on , vol.16, no.6,

pp.935-942, Nov.-Dec. 2010, doi: 10.1109/TVCG.2010.209

[10] Bruls, Mark. Huizing, Kees. Van Wijk, JarkeJ.: Squarified Treemaps, in Book: Data Visualization2010, pages: 33-42. Eurographics, Springer Vienna


14/14

[11] Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. 2002. Ordered and quantum

treemaps: Making effective use of 2D space to display hierarchies. ACM Trans. Graph. 21, 4 (October

2002), 833-854. DOI=10.1145/571647.571649

[12] Benjamin B. Bederson. PhotoMesa: a zoomable image browser using quantum treemaps and

bubblemaps. In Proceedings of the 14th annual ACM symposium on User interface software and

technology (UIST '01). ACM, New York, NY, USA, 71-80. DOI=10.1145/502348.502359[13] Ben Shneiderman. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph.

11, 1, 92-99. DOI=10.1145/102377.115768

[14] Steven Noel and Sushil Jajodia. 2004. Managing attack graph complexity through visual hierarchical

aggregation. In Proceedings of the 2004 ACM workshop on Visualization and data mining for computer

security (VizSEC/DMSEC '04). ACM, New York, NY, USA, 109-118. DOI=10.1145/1029208.1029225

[15] Wakimoto, Kazumasa. Taguri, Masaaki.: Constellation graphical method for representing multi-

dimensional data. Annals of the Institute of Statistical Mathematics, Kluwer Academic Publishers

[16] Krzywinski, Martin. Birol, Inanc. JM Jones, Steven. Marra, Marco A.: Hive plotsrational approachto visualizing networks. Brief Bioinform (2012) 13 (5): 627-644 first published online December 9, 2011

doi:10.1093/bib/bbr069

[17] Braun, Lothar and Volke, Mario and Schlamp, Johann and von Bodisco, Alexander and Carle, Georg.:

Flow-inspector: a framework for visualizing network flow data using current web technologies. Springer

Vienna

[19] Henry, Nathalie. Fekete, Jean-Daniel.: MatLink: Enhanced Matrix Visualization for Analyzing Social

Networks. Lecture Notes in Computer Science , Human-Computer InteractionINTERACT 2007

[18] Han-Ming Wu, Yin-Jing Tien, Chun-houh Chen, GAP: A graphical environment for matrix

visualization and cluster analysis, Computational Statistics & Data Analysis, Volume 54, Issue 3, 1 March2010, Pages 767-778, ISSN 0167-9473

Paper SMART Data Visualization and Exploration en 1_1

Documents

Transcript of Paper SMART Data Visualization and Exploration en 1_1