Introduction to Data Visualization Definition of Data Visualization Terms related to Data...
-
Upload
emory-hunter -
Category
Documents
-
view
284 -
download
8
Transcript of Introduction to Data Visualization Definition of Data Visualization Terms related to Data...
Lecture 31
Introduction to Data VisualizationDefinition of Data VisualizationTerms related to Data Visualization
Data MiningData RecoveryData RedundancyData AcquisitionData ValidationData IntegrityData VerificationData Aggregation
Continued….Data mining
analytic process designed to explore dataanalyzing data from different perspectivessummarizing it into useful information
Data recoveryhandling the data through the data from damaged, failed,
corrupted, or inaccessible secondary storage mediarecovery required due to physical damage to the storage
device or logical damage to the file system
Continued….Data redundancy
additional to the actual datapermits correction of errors
Data acquisitionprocess of sampling signalsmeasure real world physical conditionsconverting the resulting samples into digital numeric values
Data validationprocess of ensuring that a program operates on clean,
correct and useful data
Continued….Data integrity
maintaining and assuring the accuracy and consistency of data
ensure data is recorded exactly as intendedData verification
different types of data are checked for accuracy and inconsistencies after data migration is done
Data aggregationinformation is gathered and expressed in a summary formto get more information about particular groups
Continued….Need for data visualizationImportance of data visualizationLimitation of spreadsheet Interpretation through data visualization
identify areas that need attention or improvementunderstand what factors influence design systempredict how to change system design accordinglypredict the efficiency of system
Interactive VisualizationHumans interact with computers to create graphic illustrations of
informationProcess can be made more efficient
Human input Response time
Continued….Combination of disciplines
data visualization to provide a meaningful solution requires insights from diverse fields like statistics, data mining, graphic design, and information visualization
software-based information visualization adds building blocks for interacting with and representing various kinds of abstract data
Continued….Process of data visualization
AcquireParseFilterMineRepresentRefineInteract
AcquireObtain the data, whether from a file on a disk or a
source over a networkParse
Provide some structure for the data’s meaning, and order it into categories
FilterRemove all but the data of interest
MineApply methods from statistics or data mining as a way
to discern patterns or place the data in mathematical context
RepresentChoose a basic visual model, such as a bar graph, list,
or tree.Refine
Improve the basic representation to make it clearer and more visually engaging.
InteractAdd methods for manipulating the data or controlling
what features are visible.
Continued….Iteration and Combination of steps of data visualizationUnique requirements for each project
each data set is differentthe point of visualization is to expose that fascinating aspect
of the data and make it self-evidentreadily available representation toolkits are useful starting
pointsthey must be customized during an in-depth study of the
task
Continued….Avoid usage of excess dataAudience of problemQuantitative messages
Time-SeriesRankingPart-to-WholeDeviationFrequency-DistributionCorrelationNominal ComparisonGeographic or Geospatial
Time-series: A single variable is captured over a period of time, such as
the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend
Ranking: Categorical subdivisions are ranked in ascending or
descending order, such as a ranking of sales performance by sales persons during a single period
A bar chart may be used to show the comparison across the sales persons
Part-to-whole: Categorical subdivisions are measured as a ratio to the
whole A pie chart or bar chart can show the comparison of ratios,
such as the market share represented by competitors in a market
Deviation:Categorical subdivisions are compared again a reference,
such as a comparison of actual vs. budget expenses for several departments of a business for a given time period
A bar chart can show comparison of the actual versus the reference amount
Frequency distribution: Shows the number of observations of a particular variable for
given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc.
A histogram, a type of bar chart, may be used for this analysis
A boxplot helps visualize key statistics about the distribution, such as mean, median, quartiles, etc.
Correlation:Comparison between observations represented by two
variables (X,Y) to determine if they tend to move in the same or opposite directions
For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message
Nominal comparison: Comparing categorical subdivisions in no particular order,
such as the sales volume by product codeA bar chart may be used for this comparison
Geographic or geospatial: Comparison of a variable across a map or layout, such as
the unemployment rate by state or the number of persons on the various floors of a building
A cartogram is a typical graphic used
Continued….Characteristics of effective graphical display
show the dataavoid distorting what the data have to saypresent many numbers in a small spacemake large data sets coherentencourage the eye to compare different pieces of datareveal the data at several levels of detail, from a broad
overview to the fine structureserve a reasonably clear purpose: description,
exploration, tabulation or decorationbe closely integrated with the statistical and verbal
descriptions of a data set
Continued….Visual perception and data visualization
Effective graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes
Types of information displayTablesGraphs
Data display requires planningData collection
Benefits of data visualizationVisualization is so powerful and effective that it can change
someone’s mind in a flashit encompasses various dataset quickly, effectively and
efficiently and makes it accessible to the interested viewersIt motivates us to a deep insight with quick access It gives us opportunity to approach huge data and makes it
easily comprehensible, be it the field of entertainment, current affairs, financial issues or political affairs
It also builds in us a deep insight, prompting us to take a good decision and an immediate action if needed
It has emerged in the business world lately as geospatial visualization
The popularity of geo-spatial visualization has occurred due to lot of websites providing web services, attracting visitor’s interest
Data Visualization with C++Chapter 1 “Arrays, Pointers and Structures”Chapter 2 “Objects and Classes”Chapter 4 “Inheritance”Chapter 6 “Algorithm Analysis”
Chapter 1"Arrays, Pointers and Structures"In this chapter we examined the basics of pointers, arrays, and structuresThe pointer variable emulates the real-life indirect answer. In C++ it is
an object that stores the address where some other data reside. The pointer is special because it can be dereferenced, thus allowing access to those other data
The NULL pointer holds the constant 0, indicating that it is not currently pointing at valid data
A reference parameter is an alias. It is like a pointer constant, except that the compiler implicitly dereferences it on every access
Reference variables allow three forms of parameter passing: call by value, call by reference, and call by constant reference
Choosing the best form for a particular application is an important part of the design process
Continued….An array is a collection of identically typed objectsIn C++ there is a primitive version with second-class
semantics A vector is also part of the standard libraryIn both cases, no index range checking is performed, and out-
of-bounds array accesses can corrupt other objects. Because primitive arrays are second-class, they cannot be copied by using the assignment operator
Instead they must be copied element by element; however, a vector can be copied in a single assignment statement
A vector can be expanded as needed by calling resize
Continued….Structures are also used to store several objects, but unlike arrays,
the objects need not be identically typedEach object in the structure is a member, and is accessed by the .
member operatorThe -> operator is used to access a member of a structure that is
accessed indirectly through a pointerWe also noted that a list of items can be stored non-contiguously by
using a linked listThe advantage is that less space is used for large objects than in the
array-doubling techniqueThe penalty is that access of the ith item is no longer constant-time
but requires examination of i structures
Chapter 2 “Objects and Classes" In this chapter we described the C++ class constructThe class is the C++ mechanism used to create new types. Through it we can
define construction and destruction of objects, define copy semantics, define input and output operations, overload almost all operators, define implicit and explicit type conversion operations (sometimes a bad thing) provide for information hiding and atomicity
The class consists of two parts: the interface and the implementationThe interface tells the user of the class what the class does. The
implementation does itThe implementation frequently contains proprietary code and in some cases
is distributed only in precompiled form
Continued….Information hiding can be enforced by using the private section
in the interfaceInitialization of objects is controlled by the constructor functions,
and the destructor function is called when an object goes out of scope
The destructor typically performs clean up work, closing files and freeing memory
Finally, when implementing a class, the use of const and correct parameter passing mechanisms, as well as the decision about whether to accept a default for the Big Three, write our own Big Three, or completely disallow copying is crucial for not only efficiency but also in some cases, correctness