Introduction to Data Visualization Definition of Data Visualization Terms related to Data...

25
Lecture 31

Transcript of Introduction to Data Visualization Definition of Data Visualization Terms related to Data...

Page 1: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Lecture 31

Page 2: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Introduction to Data VisualizationDefinition of Data VisualizationTerms related to Data Visualization

Data MiningData RecoveryData RedundancyData AcquisitionData ValidationData IntegrityData VerificationData Aggregation

Page 3: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Data mining

analytic process designed to explore dataanalyzing data from different perspectivessummarizing it into useful information

Data recoveryhandling the data through the data from damaged, failed,

corrupted, or inaccessible secondary storage mediarecovery required due to physical damage to the storage

device or logical damage to the file system

Page 4: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Data redundancy

additional to the actual datapermits correction of errors

Data acquisitionprocess of sampling signalsmeasure real world physical conditionsconverting the resulting samples into digital numeric values

Data validationprocess of ensuring that a program operates on clean,

correct and useful data

Page 5: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Data integrity

maintaining and assuring the accuracy and consistency of data

ensure data is recorded exactly as intendedData verification

different types of data are checked for accuracy and inconsistencies after data migration is done

Data aggregationinformation is gathered and expressed in a summary formto get more information about particular groups

Page 6: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Need for data visualizationImportance of data visualizationLimitation of spreadsheet Interpretation through data visualization

identify areas that need attention or improvementunderstand what factors influence design systempredict how to change system design accordinglypredict the efficiency of system

Interactive VisualizationHumans interact with computers to create graphic illustrations of

informationProcess can be made more efficient

Human input Response time

Page 7: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Combination of disciplines

data visualization to provide a meaningful solution requires insights from diverse fields like statistics, data mining, graphic design, and information visualization

software-based information visualization adds building blocks for interacting with and representing various kinds of abstract data

Page 8: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Process of data visualization

AcquireParseFilterMineRepresentRefineInteract

Page 9: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

AcquireObtain the data, whether from a file on a disk or a

source over a networkParse

Provide some structure for the data’s meaning, and order it into categories

FilterRemove all but the data of interest

MineApply methods from statistics or data mining as a way

to discern patterns or place the data in mathematical context

Page 10: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

RepresentChoose a basic visual model, such as a bar graph, list,

or tree.Refine

Improve the basic representation to make it clearer and more visually engaging.

InteractAdd methods for manipulating the data or controlling

what features are visible.

Page 11: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Iteration and Combination of steps of data visualizationUnique requirements for each project

each data set is differentthe point of visualization is to expose that fascinating aspect

of the data and make it self-evidentreadily available representation toolkits are useful starting

pointsthey must be customized during an in-depth study of the

task

Page 12: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Avoid usage of excess dataAudience of problemQuantitative messages

Time-SeriesRankingPart-to-WholeDeviationFrequency-DistributionCorrelationNominal ComparisonGeographic or Geospatial

Page 13: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Time-series: A single variable is captured over a period of time, such as

the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend

Ranking: Categorical subdivisions are ranked in ascending or

descending order, such as a ranking of sales performance by sales persons during a single period

A bar chart may be used to show the comparison across the sales persons

Page 14: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Part-to-whole: Categorical subdivisions are measured as a ratio to the

whole A pie chart or bar chart can show the comparison of ratios,

such as the market share represented by competitors in a market

Deviation:Categorical subdivisions are compared again a reference,

such as a comparison of actual vs. budget expenses for several departments of a business for a given time period

A bar chart can show comparison of the actual versus the reference amount

Page 15: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Frequency distribution: Shows the number of observations of a particular variable for

given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc.

A histogram, a type of bar chart, may be used for this analysis

A boxplot helps visualize key statistics about the distribution, such as mean, median, quartiles, etc.

Correlation:Comparison between observations represented by two

variables (X,Y) to determine if they tend to move in the same or opposite directions

For example, plotting unemployment (X) and inflation (Y) for a sample of months. A scatter plot is typically used for this message

Page 16: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Nominal comparison: Comparing categorical subdivisions in no particular order,

such as the sales volume by product codeA bar chart may be used for this comparison

Geographic or geospatial: Comparison of a variable across a map or layout, such as

the unemployment rate by state or the number of persons on the various floors of a building

A cartogram is a typical graphic used

Page 17: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Characteristics of effective graphical display

show the dataavoid distorting what the data have to saypresent many numbers in a small spacemake large data sets coherentencourage the eye to compare different pieces of datareveal the data at several levels of detail, from a broad

overview to the fine structureserve a reasonably clear purpose: description,

exploration, tabulation or decorationbe closely integrated with the statistical and verbal

descriptions of a data set

Page 18: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Visual perception and data visualization

Effective graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes

Types of information displayTablesGraphs

Data display requires planningData collection

Page 19: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Benefits of data visualizationVisualization is so powerful and effective that it can change

someone’s mind in a flashit encompasses various dataset quickly, effectively and

efficiently and makes it accessible to the interested viewersIt motivates us to a deep insight with quick access It gives us opportunity to approach huge data and makes it

easily comprehensible, be it the field of entertainment, current affairs, financial issues or political affairs

It also builds in us a deep insight, prompting us to take a good decision and an immediate action if needed

It has emerged in the business world lately as geospatial visualization

The popularity of geo-spatial visualization has occurred due to lot of websites providing web services, attracting visitor’s interest

Page 20: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Data Visualization with C++Chapter 1 “Arrays, Pointers and Structures”Chapter 2 “Objects and Classes”Chapter 4 “Inheritance”Chapter 6 “Algorithm Analysis”

Page 21: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Chapter 1"Arrays, Pointers and Structures"In this chapter we examined the basics of pointers, arrays, and structuresThe pointer variable emulates the real-life indirect answer. In C++ it is

an object that stores the address where some other data reside. The pointer is special because it can be dereferenced, thus allowing access to those other data

The NULL pointer holds the constant 0, indicating that it is not currently pointing at valid data

A reference parameter is an alias. It is like a pointer constant, except that the compiler implicitly dereferences it on every access

Reference variables allow three forms of parameter passing: call by value, call by reference, and call by constant reference

Choosing the best form for a particular application is an important part of the design process

Page 22: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….An array is a collection of identically typed objectsIn C++ there is a primitive version with second-class

semantics A vector is also part of the standard libraryIn both cases, no index range checking is performed, and out-

of-bounds array accesses can corrupt other objects. Because primitive arrays are second-class, they cannot be copied by using the assignment operator

Instead they must be copied element by element; however, a vector can be copied in a single assignment statement

A vector can be expanded as needed by calling resize

Page 23: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Structures are also used to store several objects, but unlike arrays,

the objects need not be identically typedEach object in the structure is a member, and is accessed by the .

member operatorThe -> operator is used to access a member of a structure that is

accessed indirectly through a pointerWe also noted that a list of items can be stored non-contiguously by

using a linked listThe advantage is that less space is used for large objects than in the

array-doubling techniqueThe penalty is that access of the ith item is no longer constant-time

but requires examination of i structures

Page 24: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Chapter 2 “Objects and Classes" In this chapter we described the C++ class constructThe class is the C++ mechanism used to create new types. Through it we can

define construction and destruction of objects, define copy semantics, define input and output operations, overload almost all operators, define implicit and explicit type conversion operations (sometimes a bad thing) provide for information hiding and atomicity

The class consists of two parts: the interface and the implementationThe interface tells the user of the class what the class does. The

implementation does itThe implementation frequently contains proprietary code and in some cases

is distributed only in precompiled form

Page 25: Introduction to Data Visualization Definition of Data Visualization Terms related to Data Visualization Data Mining Data Recovery Data Redundancy Data.

Continued….Information hiding can be enforced by using the private section

in the interfaceInitialization of objects is controlled by the constructor functions,

and the destructor function is called when an object goes out of scope

The destructor typically performs clean up work, closing files and freeing memory

Finally, when implementing a class, the use of const and correct parameter passing mechanisms, as well as the decision about whether to accept a default for the Big Three, write our own Big Three, or completely disallow copying is crucial for not only efficiency but also in some cases, correctness