Paper SMART Data Visualization and Exploration en 1_1

download Paper SMART Data Visualization and Exploration en 1_1

of 14

Transcript of Paper SMART Data Visualization and Exploration en 1_1

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    1/14

    SMART Data Visualization and Exploration

    Filipe Clrigo, Ricardo Raminhos, Rui Estevo

    VIATECLA SA

    [email protected],[email protected],[email protected]

    Teresa Gonalves, Pedro Melgueira

    Universidade de vora

    [email protected],[email protected]

    Summary

    The continuous growth on the volume of information/data does not mean a proportional increase on its

    related knowledge. Even, in some cases the actual increase of information contributes to a decline onthe quality of that knowledge. The existence of automatic analysis and visual inspection mechanisms

    (normally under a supervised format), represent an important added value especially when these

    mechanisms are naturally integrated in repositories that are specialized in managing big volumes of

    content (i.e. CMSContent Management Systems).

    As some of these repositories are open, they allow a high level of flexibility to the organisations that use

    them, since it is possible to freely model their business data structures. However, it also means they are

    not restricted to a certain domain of specific information which brings a great challenge on the way data

    is interpreted and visually presented, as its structure is not known beforehand.

    This is the main purpose of the SMART Content Provider prototype. The current paper considers and

    presents the results obtained in its visual and exploration data areas, applied to open repositories of

    information.

    The SMART Content Provider CP) Project

    Through the Smart CP [1] project, investigation on enhancing Intelligence on CMS environments was

    performed under three main pillars:

    (i) Enhance mechanisms of aggregation of heterogeneous information (where the structures and

    objects are not known beforehand),

    (ii)

    Define and apply Artificial Intelligence Algorithms, in particular in the area of the detection ofpatterns on semi-structured information,

    (iii) Apply mechanisms of data presentation to results/contents, exploring non-conventional

    formats and ways of information representation that contribute to a more fluid knowledge

    exploration.

    The knowledge resulting from this investigation has been materialized in a prototype for a generic

    platform for data visualization and interaction, referred as SMART Content Provider (CP), a project

    developed by VIATECLA [2], supported by Universidade de vora [3] and GTE Consultores [4], and co-

    financed by QREN (Quadro de Referncia Estratgico Nacional) [5].

    The present paper focuses only on the third element of the project related to the presentation andexploration of information components. A general presentation of the project, in terms of its objectives,

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    2/14

    architecture and results, can be found on the paper SMART Content Provider [6], whilst the detailed

    presentation of the application of AI Algorithms is available on the paper Data Clustering for

    heterogeneous data [7].

    Architecture

    Figure 1 shows a global vision for the SMART CP architectural platform. A three colour scheme is used tocharacterize its functional blocks that compose the platform or external interactions:

    Orange: completely external to the platform, with which the SMART CP platform interacts to obtain

    data / contents,

    Green: Functional blocks with which the SMART CP platform is integrated, i.e. regarding the native

    content management system that supports the platform;

    Purple: Native blocks from the SMART CP platform.

    Figure 1: General diagram of the architecture of the platform SMART CP

    The architecture for the SMART CP platform follows a classic client/server paradigm, as presented in

    Figure 1.Blocks regarding the server component are represented on the top of the image, and blocks

    relating with client components on the bottom. Because Smart CP platform uses data/contents present

    in content management systems, all client functional groups (i.e. data sorting, data visuals and

    exploration, accountability and workflows) are integrated in the content management system backoffice

    itself.

    SMART

    Aggregation

    Scriptor Server Core(External Content Manager)

    SMART Data Layer

    SMART Import REST APIJSON Data

    Formatter

    MS Excel

    (External)

    MS SQL Database

    (External)

    Third Party External

    Content Manager

    Scriptor Server Backoffice

    (External Content Manager)

    SMART Analyser

    Data Sorting

    SMART Views SMART Elastic SMART Magic Board SMART TimelineSMART NavigationSMART Graphs SMART State

    Data Visuals and Exploration Accountability Workflows

    Scriptor Server API

    Server Layer

    Client Layer

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    3/14

    State of the Art

    Several interesting visual approaches are emerging that allow visualization actions, content

    manipulation and exploration in line with SMART CP representation objectives. Some of these

    approaches shall be analysed next. Although some of these visual representation theories are purely

    conceptual they can be easily adapted to business analytics and clustering contexts, both primary lines

    of research for the SMART CP project.

    Figure 2: Examples of data visualization and exploration approaches

    Figure 2 (top left), presents a visualization method that allows a hierarchical notion applied to different

    classes of data [8, 9], enabling also a balanced perception for those data classes on each level,

    simultaneously. On this example, the data is presented at two levels only (internal and external).

    However, further multi-levels can be applied progressively without the diagram becoming excessively

    confusing.

    The representation on the top centre shows a simple, but interesting mapping on the number of event

    incidences for each variable, on a representation in the form of an area [10, 11]. This way it is possible to

    observe which are the dominant variables, and most important, relate the order of magnitude between

    them [12, 13].

    On the top right, the figure shows what is known as a constellation[14, 15]. This concept is used on

    the representation of connections between data as it is the case of graphs that can be presented using

    several shapes and colours, with three-dimensional effects or on a plan. The possible variation on the

    node format of the constellation can have some extra information, which will distinguish nodes between

    them, with colour, size or format changes so it is possible to place a great quantity of information on the

    constellation, without it being excessively confusing, being also possible to represent and highlight the

    presence of clusters on the represented data.

    On the bottom left, a diagram in the form of a circle is present [16, 17]. Outside the circle, and around it,

    are the objects to be analysed while on the inside the relations between them are shown. Some visual

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    4/14

    constructs can vary in order to help the comprehension and differentiation of the data. As seen in the

    example, the most important connections are visible through the thickness of connecting lines.

    Representations on the centre and bottom right present a simple object distribution matrix [18, 19]. On

    the first, unorganised raw data with a high level of entropy is shown. On the second one, the data has

    been reorganised through clustering techniques and then grouped according to its degree of similarity.

    This way, a diagram where it is easy to detect and observe groups of data that were previously scatteredand of difficult identification is obtained.

    Some of these visual concepts have been applied to SMART CP visualization components, as it is

    presented in the following sections.

    SMART components for data exploration and visualization

    Regarding the client layer, as mentioned before, all contents developed are integrated in the online CMS

    platform backoffice. In a conceptual point of view, the seven components are grouped in four main

    areas:

    1.

    Data Sorting and Filtering;

    2.

    Data Visuals and Exploration;

    3. Accountability;

    4. Business Workflows.

    The Data Sorting area is materialized through the SMART Views component. This component allows

    content sorting and filtering operations in an intelligent way, being completely generic (i.e. by not

    knowing the content structure beforehand). These views can be defined privately or can be made

    public. The contents processed by the SMART Views component can be directly viewed, listed in a

    simple way, or the results can be later used as a source of data for other visual components (e.g. SMART

    Graphs, SMART Elastic).

    Regarding the Data Visuals and Exploration area, the following components are present: SMART

    Elastic, SMART Magic Board, SMART Graphsand SMART Navigation. The first two components will be

    presented in detail in the following section.

    The SMART Navigationcomponent relates to the presentation of metrics and the possible actions to

    perform over an aggregated set of contents, in a dashboard/control panel logic. Through graphics,

    listings as well as metrics associated with different colour levels, it is possible to identify possible limit

    situations that require further attention from the manager/administrator, in a graphical way.

    The SMART Graphscomponent presents a relatively standard set of charts, where the user can verify

    the spread of results taken from the selected sample. Although this component is not completely

    innovative by itself, its graphical and information exploitation aspect has been very important to

    implement, as it provides relevant information, mainly to users that are not so experienced/keen in

    content exploration processes.

    The Accountability area is represented by the SMART Timeline component that will be presented in

    detail in the following section.

    Finally, the Workflowsarea is represented through the component SMART State. The Scriptor Server

    platform has an internal workflow engine, which was enriched with this SMART CP graphical component

    allowing the generic creation of workflows/business flows. Due to this graphical generic component

    (with minimal technical complexity) administrators can build business workflows specific to their

    domain, not being restricted to pre-designed business workflows.

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    5/14

    SMART Magic Board

    The Magic Board graphic component allows to represent and explore contents of multiple dimensions

    simultaneously, through its representation on a 2D plan, with which one or more attributes are shown

    on the horizontal axis and one or more attributes are shown on the vertical axis, as well as mapping

    capabilities of attribute values in the form of colour, shape and/or size.

    As other visual components, this is integrated within the backoffice of the content manager. However,and due to space/visibility requirements for the exploring area, it is possible to use this component in a

    full screen mode.

    This component uses data previously aggregated by the platform (within its SMART Aggregation server

    component), in order to allow a quick presentation of results, with all computation and aggregation

    processes carried out at the time data is entered and updated at the CMS (versus being computed on

    request).

    Initially, the user defines which attribute dimensions wants to explore (Figure 3 on the left). For

    example, for a given object/content that represents an issue of a ticketing tool (e.g. a clarification

    request, amendment or bug report), by selecting and dragging the attribute Environment to thehorizontal axis of representation, Figure 3 on the right is obtained as a result. On this screen an only for

    the values to which the attribute has results, contents are presented, randomly on the available space,

    where each content is mapped to a circular icon.

    Note that on this representation, the main concern, at least on a first moment involves understanding

    how the set of contents is spread globally and not the analysis on the content itself. However, it is

    possible to access the information about one specific content at any time, by clicking on its icon visual

    representation (where a message with the name/title of the content is shown). In case the user clicks a

    second time on the content icon a window previewing the content data is open.

    Figure 3: Data initialization on the Magic Board

    It is equally possible to select further attributes and place them on the vertical axis. By having an

    attribute previously defined on the horizontal axis, this will result in a two axis plan (Figure 4 on the

    left). As a result, in the given example, it is easy to analyse that the great majority of the issues

    included in the CMS platform relate to tasks and are associated to the development environment.

    The existence of requests associated to the pre-production and production environments is

    relatively low in comparison to the other environments. On the other side, issues of the type

    request and bug can only be found on the developmentenvironment.

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    6/14

    It is possible to do finer partitions on the horizontal and/or vertical axis, adding attributes to a second

    level. The screen on theFigure 4 (on the right) shows the addition of the secondary attribute Assigned

    To to the horizontal axis. We can note that this attribute also contributes to a very low spread rate to

    results, as there are few results that are assigned to the design team. In fact, mostresults are assigned

    to VIATECLA.

    Figure 4: Attribute addition on a 2D plan (with a unique dimension on the horizontal axisleft | with two dimensions on thehorizontal axisright)

    Figure 5 shows a similar example of two attributes cross-checking on the horizontal and vertical axis,

    reflecting a real example for a VIATECLAs client that participated on the prototype validation. In this

    case, the contents with the attributes Area (horizontal axis) and Assigned To (vertical axis) have

    been cross-checkedi.e. requests assigned to VIATECLA employees by project area.

    Figure 5: Graphic representation of contents on the horizontal dimension rea and vertical dimension Assigned Person"

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    7/14

    Note that in addition to the possibility of expressing attribute dimensions through their representation

    on the horizontal and vertical axis, it is also possible to represent them through the use of colour, shape

    and size. This specification can be performed in the Visuals area of the component. Regarding the

    example of representation through colour, and by selecting an attribute of the enumerated type, one

    option for the mapping of those values into a colour range is presented (Figure 6). A similar approach is

    carried out for the mapping by a range of shapes (Figure 7on the left) and dimensions (Figure 7on

    the right).

    Figure 6: Representation of the data dimension through a colour range

    Figure 7: Representation of the data dimensions through shape and size ranges

    Also present on the configuration section of the SMART Magic Board, there is a filter area that allows to

    limit the universe of contents (regardless of the shape and format that contents would be represented).

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    8/14

    With similarities to the configuration of the Visuals area, this field allows to select one attribute and

    define which values should be (or not) be considered as a filter.

    Finally, and when we want to add more than two attributes on the horizontal or vertical axis, or in case

    the two attributes selected have a low spread level, which would result in a very high combinatory, it is

    possible to do a drill-down on a specific quadrant of the specific universe, selected by the user. Thus,

    considering the example on theFigure 8 (on the left), in case the user selects the quadrant on the topleft corner a drill-down is done, and all universe of results now becomes the one of the quadrant

    selected, on the lower level (Figure 8 on the right). The attributes previously selected (i.e.

    Environment and Type) are fixed, and the user can drag other attributes to the horizontal and vertical

    axis, in order to decompose/explore even more information present at this level.

    Figure 8: Drill-down mechanism application to the data universe

    SMART Elastic

    The SMART Elastic component allows the creation of dynamic filters that can be applied to contents

    where their structure is not previously known, as well as determining the form that the contents should

    appear on screen as a result (i.e. which fields of information).

    For the configuration itself (Figure 9), the user is questioned about the attribute fields of the object that

    should be used as a dynamic/elastic filter, which attribute fields that should appear on the form that

    will represent a content result, and what field should be used on the sorting/serialization of contents.

    For this example the selection of the attributes Status, Project, Area, Environment, Priority,

    and Assigned Persons as filtering fields, and the attributeTitle a detail field, results inFigure 10.

    The expression Elastic associated to the component arises from the fact that when a value in one of

    the filtering dimensions is selected, the values of the remaining filters and the contents filtered are

    recalculated in an elastic way; for example on the values of the remaining filters, values that may

    further filter contents are kept, and all other values to which there are no contents with associated

    value are removed.

    In this way, the creation of Boolean filtering rules through AND operators is possible, by selecting

    different contents in different filter columns, and rules with OR operator when more than one attributevalue is selected on the same filter column.

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    9/14

    Figure 9: Initial screen for the dimension specification for the SMART Elastic component

    Figure 10: Initial result presentation with no filter configuration

    Since this component has a strong tendency for the exploitation of results, mainly on trial and error, itis essential that the response times are very quick. With this in mind, the data applied by the component

    for presentation do not represent the actual raw contents, but a set of indexations and pre -

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    10/14

    aggregated values that will be made available by the SMART Aggregation server component. Thus the

    effort on computation and aggregation is carried out incrementally at the time contents are created and

    updated, and at the moment of visualization data will only be presented, as it has already been

    processed.

    As an example, the change of the filtering rule Priority(with the value Major) AND Project(with value

    K4T Mobile Apps) expressed for Priority (with value Major) AND Workflow Status (with valueNotAnIssue) Figure 11, is carried out with two interactions (removal of the filter Project, and

    addition of the filter Workflow Status) reducing the number of results obtained from 536 to 6 within a

    time of 1 to 2 seconds.

    Figure 11: Result presentation upon dimension filtering

    SMART Timeline

    The SMART Timeline component is specialized on the representation of contents on a time axis.

    However, it follows a different approach, by not focusing on the representation of the content according

    to one of its date type attributes, but by considering the dates when content changes have occurred

    (either at their attribute/field levels, or through workflow changes).

    The component addresses a very important question regarding accountability on content handling that

    sometimes is minimized at the level of the content manager, or by presenting a vision of the

    information that is too technical (e.g. on the form of log files). This way it is possible to inspect change

    of which contents have been amended or had changes of status / workflow, by presenting information

    related to the moment before the changes, and after those changes, in a graphic form, where it is

    possible to easily identify whomade whichchange and when, on the temporal axis.

    OnFigure 12 (on the left) a graphic presentation of the timeline is shown. By default, and not having any

    option selected on the component, it delivers a representation where time events are distributed evenly

    on the timeline. On the top of the screen (left side) it is possible to view the start and end dates of theamendments and the period of time between those dates (this indicator can be of great importance in

    the case of Service Level Agreement contract clauses).

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    11/14

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    12/14

    Figure 14: Graphic representation of time occurrences (highlight of actions by the same user)

    Figure 15: Inspection of changes made to a certain content

    Evaluation and Future work

    The correction for the SMART CP prototype can only be assessed by its effective use. During the final

    stages of its development, a pilot has been made available so that the platform could be refined

    according to feedback collection.

    As for the scope of the test pilot, a ticketing system, already implemented at VIATECLA and named One

    system was selected (communication between client and supplier). The One System is a collaborative

    tool aiming at improving productivity, where the client and the development team issue and generateissues with the capacity of giving the adequate follow-up to each situation, that could be (or not)

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    13/14

    critical to the business, within the context of the clients project. Through this solution the

    communication with the development, operation and project teams is simplified.

    Clients have at their own disposal the facilities that allow them to communicate in real time about

    technical questions, requests for clarification about the use of functionalities operated by all VIATECLA

    platforms, or even address general comments.

    The result of putting this test pilot into place has been a major success, as it has exceeded all initialexpectations. According to feedback received from the technical teams (not involved with the initial

    project) about the platform usage and from VIATECLAsclients that have participated in an informal way

    on the project validation were willing to keep using this environment in a more operational way after

    the project validation was finished. This constitutes a recognition of the added value the SMART CP

    platform brings.

    As the test pilot focus is on the administration of high volume of contents (e.g. issues) and with

    different levels of priorities, the components SMART Views, SMART Elastic, SMART Magic Board,

    SMART Navigation and SMART Timeline, were the ones that got the most positive feedback,

    because the impact they had on the information management, by turning it more visual andcomprehensive, allowing users to explore it through the drill-down tools and multi-filter criteria.

    Future work of SMART CP includes the definition of strategies for launching the platform on the market,

    aiming at getting more and better feedback for the improvement and innovation of the work carried out

    with the effective use of the platform.

    References

    [1] Microsite SMART CP. 2015, http://www.viatecla.com/inovacao/smart_content_provider[2] VIATECLA, Institucional website. 2015, http://www.viatecla.com

    [3] University of vora, Institucional website. 2015, http://www.uevora.pt/

    [4] GTE, Institucional website. 2015, http://www.gte.pt/

    [5] National Strategic Reference Framework (NSRF), Institucional website. 2015,

    http://www.qren.pt/np4/home

    [6] Clrigo, Filipe. Raminhos, Ricardo. Estevo, Rui. Gonalves, Teresa. Melgueira, Pedro.: SMART

    Content Provider, 2015

    [7] Gonalves, Teresa. Melgueira, Pedro. Clrigo, Filipe. Raminhos, Ricardo. Estevo, Rui.: Data

    Clustering for heterogeneous data, 2015

    [8] Draper, G.; Livnat, Y.; Riesenfeld, R.F., "A Survey of Radial Methods for Information Visualization," in

    Visualization and Computer Graphics, IEEE Transactions on , vol.15, no.5, pp.759-776, Sept.-Oct. 2009,

    doi: 10.1109/TVCG.2009.23

    [9] Diehl, S.; Beck, F.; Burch, M., "Uncovering Strengths and Weaknesses of Radial Visualizations---an

    Empirical Approach," in Visualization and Computer Graphics, IEEE Transactions on , vol.16, no.6,

    pp.935-942, Nov.-Dec. 2010, doi: 10.1109/TVCG.2010.209

    [10] Bruls, Mark. Huizing, Kees. Van Wijk, JarkeJ.: Squarified Treemaps, in Book: Data Visualization2010, pages: 33-42. Eurographics, Springer Vienna

  • 7/23/2019 Paper SMART Data Visualization and Exploration en 1_1

    14/14

    [11] Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. 2002. Ordered and quantum

    treemaps: Making effective use of 2D space to display hierarchies. ACM Trans. Graph. 21, 4 (October

    2002), 833-854. DOI=10.1145/571647.571649

    [12] Benjamin B. Bederson. PhotoMesa: a zoomable image browser using quantum treemaps and

    bubblemaps. In Proceedings of the 14th annual ACM symposium on User interface software and

    technology (UIST '01). ACM, New York, NY, USA, 71-80. DOI=10.1145/502348.502359[13] Ben Shneiderman. Tree visualization with tree-maps: 2-d space-filling approach. ACM Trans. Graph.

    11, 1, 92-99. DOI=10.1145/102377.115768

    [14] Steven Noel and Sushil Jajodia. 2004. Managing attack graph complexity through visual hierarchical

    aggregation. In Proceedings of the 2004 ACM workshop on Visualization and data mining for computer

    security (VizSEC/DMSEC '04). ACM, New York, NY, USA, 109-118. DOI=10.1145/1029208.1029225

    [15] Wakimoto, Kazumasa. Taguri, Masaaki.: Constellation graphical method for representing multi-

    dimensional data. Annals of the Institute of Statistical Mathematics, Kluwer Academic Publishers

    [16] Krzywinski, Martin. Birol, Inanc. JM Jones, Steven. Marra, Marco A.: Hive plotsrational approachto visualizing networks. Brief Bioinform (2012) 13 (5): 627-644 first published online December 9, 2011

    doi:10.1093/bib/bbr069

    [17] Braun, Lothar and Volke, Mario and Schlamp, Johann and von Bodisco, Alexander and Carle, Georg.:

    Flow-inspector: a framework for visualizing network flow data using current web technologies. Springer

    Vienna

    [19] Henry, Nathalie. Fekete, Jean-Daniel.: MatLink: Enhanced Matrix Visualization for Analyzing Social

    Networks. Lecture Notes in Computer Science , Human-Computer InteractionINTERACT 2007

    [18] Han-Ming Wu, Yin-Jing Tien, Chun-houh Chen, GAP: A graphical environment for matrix

    visualization and cluster analysis, Computational Statistics & Data Analysis, Volume 54, Issue 3, 1 March2010, Pages 767-778, ISSN 0167-9473