Uncovering Clusters in Crowded Parallel Coordinates Visualizations
description
Transcript of Uncovering Clusters in Crowded Parallel Coordinates Visualizations
![Page 1: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/1.jpg)
Uncovering Clusters in Crowded Parallel Coordinates Visualizations
Alimir Olivettr Artero, Maria Cristina Ferreiara de Oliveira, Haim levkowitz
Information Visualization 2004
![Page 2: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/2.jpg)
Abstract
• The idea is inspired by traditional image processing techniques such as grayscale manipulation.
• Reducing visual clutter and allowing the analyst to observe relevant patterns in the parallel coordinates.
![Page 3: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/3.jpg)
Introduction
• The strong overlapping of graphical markers hampers the user’s ability to identify patterns in the data when the number of records and the dimensionality of the data set are high.
• It is important to avoid displaying irrelevant information and enhancing the presentation of the useful one.
![Page 4: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/4.jpg)
Introduction
• Tackling this problem with a strategy that computes frequency and density information, and uses them in parallel coordinates visualizations to filter out the information to be presented to the user.
![Page 5: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/5.jpg)
Frequency Information
• The frequency function for a n-dimensional variable x is defined as :
where h is the size of bins, σ is the number of records in the same bin, m is the number of all records.
![Page 6: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/6.jpg)
Frequency Information
• A two-dimensional matrix is generated to store the frequency of each pair of attribute values, which is then used to draw the polygonal lines for the records in the data set.
• For a data set with n attributes, n-1 frequency matrices are generated, one for each pair of attributes.
![Page 7: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/7.jpg)
Frequency Information
• All the non-zero matrix elements generate a line segment in the visualization and the pixel intensity used to draw the line segment.
• Each line segment is drawn with the Bresenham algorithm:
![Page 8: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/8.jpg)
Interactive Parallel Coordinates Frequency and Density plots
• The intensity of the pixel with coordinates (q,p) is given by:
• Square wave smoothing filter is used for each pixel:
![Page 9: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/9.jpg)
Interactive Parallel Coordinates Frequency and Density plots
• S is a scaling factor.
![Page 10: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/10.jpg)
Density Information
• The density function for a n-dimensional variable x is defined as :
where di is the i-th record of the data set and K is the kernel function, the parameter defines a smoothing factor or bandwidth.
![Page 11: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/11.jpg)
visualizations of the Pollen data
a) Frequency Plot b) Density Plot
![Page 12: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/12.jpg)
Interactive high-dimensional clustering with IPC plot
![Page 13: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/13.jpg)
Interactive high-dimensional clustering with IPC plot
![Page 14: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/14.jpg)
Interactive high-dimensional clustering with IPC plot
![Page 15: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/15.jpg)
Interactive high-dimensional clustering with IPC plot
![Page 16: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/16.jpg)
Interactive high-dimensional clustering with IPC plot
![Page 17: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/17.jpg)
Performance
• Running times in seconds for the proposed algorithm with different values of m and n.
![Page 18: Uncovering Clusters in Crowded Parallel Coordinates Visualizations](https://reader036.fdocuments.us/reader036/viewer/2022070422/56816456550346895dd6231a/html5/thumbnails/18.jpg)
Conclusions
• The new plots support interactive data exploration of large and high-dimensional data sets, allowing users to remove noise and highlight areas with high concentration of data.
• The proposed algorithms use only integer arithmetic to compute the frequency matrices.