Chapter 1: Introduction, Exploring Data

61
. . . . . . Introduction Terminologies Identiϐication Summary Statistics Basic Visualization Visualization for High-Dimensional Data OLAP and Multidimensional Data Analysis Supplementary . . Chapter 1: Introduction, Exploring Data Richard Liu School of Mathematics, XMU February 26, 2020 Richard Liu Chapter 1: Introduction, Exploring Data 1 / 61

Transcript of Chapter 1: Introduction, Exploring Data

Page 1: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.

...... Chapter 1: Introduction, Exploring Data

Richard LiuSchool of Mathematics, XMU

February 26, 2020

Richard Liu Chapter 1: Introduction, Exploring Data 1 / 61

Page 2: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Content...1 Introduction...2 Terminologies Identi ication...3 Summary Statistics...4 Basic Visualization...5 Visualization for High-Dimensional Data...6 OLAP and Multidimensional Data Analysis...7 Supplementary

Richard Liu Chapter 1: Introduction, Exploring Data 2 / 61

Page 3: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Section 1

Introduction

Richard Liu Chapter 1: Introduction, Exploring Data 3 / 61

Page 4: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. IntroductionFour topics are discussed in this chapter.

Terminologies Identi icationSummary StatisticsVisualizationOn-Line Analytical Processing(OLAP)

Used for exploring multidimensional arrays of values.This chapter is tightly related to the area known asExploratory Data Analysis (EDA), other parts of which areCluster Analysis and Anomaly Detection which are coveredin Chapter 8 to 10.

Richard Liu Chapter 1: Introduction, Exploring Data 4 / 61

Page 5: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Terminologies Identi ication

Data ScienceMachine LearningData MiningBusiness Analytics

Richard Liu Chapter 1: Introduction, Exploring Data 5 / 61

Page 6: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Terminologies Identi ication

Classi icationRegressionClustering

Richard Liu Chapter 1: Introduction, Exploring Data 6 / 61

Page 7: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Terminologies Identi ication

Numerical DataCategorical DataBinary Data

Richard Liu Chapter 1: Introduction, Exploring Data 7 / 61

Page 8: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Terminologies Identi ication

Low-dimensional DataHigh-dimensional Data

Richard Liu Chapter 1: Introduction, Exploring Data 8 / 61

Page 9: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Machine Learning Structures

Training SetTest SetValidation Set

Richard Liu Chapter 1: Introduction, Exploring Data 9 / 61

Page 10: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

So, what is data science?

Richard Liu Chapter 1: Introduction, Exploring Data 10 / 61

Page 11: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Section 3

Summary Statistics

Richard Liu Chapter 1: Introduction, Exploring Data 11 / 61

Page 12: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Summary StatisticsA quantity which describes an overall characteristic of oneset of values..De inition 1: Frequency..

......

Given a random variable x, which can take values{v1, · · · , vn} (vi ̸= vj for i ̸= j), then

Frequency(vi) =#{vi}n

where #{vi}means the number of value vi in a speci ic dataset S.

Richard Liu Chapter 1: Introduction, Exploring Data 12 / 61

Page 13: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Summary Statistics

.De inition 2: Mode..

......Given a data set S, Mode(S) = argmaxvi Frequency(vi)

For the data in reality, usually mode happens more thanonce.Frequently used as an indicator of the missingvalue.(Why?)

Richard Liu Chapter 1: Introduction, Exploring Data 13 / 61

Page 14: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Summary Statistics.De inition 3: Percentiles..

......The pth percentile xp is de ined as a value x such that p% ofthe observed values of x are less than xp.

Obviously x0% = min(x), x100% = max(x).De inition 4: Mean..

......

Assume {x1, · · · , xm} is an ordered set of observed values,denoted by x, then

mean(x) = x̄ =1

m

m∑i=1

xi

Richard Liu Chapter 1: Introduction, Exploring Data 14 / 61

Page 15: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Summary Statistics.Notation 1..

......De ine x↓ = {x(1), · · · , x(m)} a permutation of x,x(1) ≥ x(2) ≥ · · · ≥ x(m)..De inition 5: Median..

......

median(x) = median(x↓) ={x(r+1) m = 2r+ 112(x(r) + x(r+1)) m = 2r

wherem = |x|, the cardinality of set x.

Mean and median are the measures of the location of a set ofvalues. Richard Liu Chapter 1: Introduction, Exploring Data 15 / 61

Page 16: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Summary Statistics.De inition 6: Range........ range(x) = max(x)−min(x).De inition 7: Variance..

......variance(x) = s2x =

1

m− 1

m∑i=1

(xi − x̄)2

.De inition 8: Standard Deviation..

...... sd(x) = sx =√

s2x

There three are the measures of the spread of a set of values.Richard Liu Chapter 1: Introduction, Exploring Data 16 / 61

Page 17: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Other Quantities

Due to the existence of outliers, previous statistics are notrobust, some alternatives are considered therefore..De inition 9: Absolute Average Deviation(AAD)..

......AAD(x) = 1

m

m∑i=1

|xi − x̄|

.De inition 10: Median Absolute Deviation(MAD)..

...... MAD(x) = median({|xi − x̄|}mi=1)

Richard Liu Chapter 1: Introduction, Exploring Data 17 / 61

Page 18: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Other Quantities

.De inition 11: Interquartile Range(IQR)..

...... IQR(x) = x75% − x25%

Richard Liu Chapter 1: Introduction, Exploring Data 18 / 61

Page 19: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Multivariate CaseSometimes a data set consists of several sets of valuesbelonging to different attributes of that. In other words,

x = (x1, · · · , xn)

where n is the number of attributes, xi is a set of values,comprising the ith attribute of all observed data.In this case, every observed data point is a vector, not anumber. So we de ine mean as

x̄ = (x̄1, · · · , x̄n)

Richard Liu Chapter 1: Introduction, Exploring Data 19 / 61

Page 20: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Multivariate Case.De inition 12: Covariance..

......covariance(xi, xj) =

1

m− 1

m∑k=1

(xki − x̄i)(xkj − x̄j)

.De inition 13: Correlation..

......rij = correlation(xi, xj) =

covariance(xi, xj)sisj

Evidently correlation(xi, xj) = variance(xi)

Richard Liu Chapter 1: Introduction, Exploring Data 20 / 61

Page 21: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Section 4

Basic Visualization

Richard Liu Chapter 1: Introduction, Exploring Data 21 / 61

Page 22: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Motivation

People could absorb large amounts of data from onegraph quickly.Make use of the domain knowledge that is 'locked up inpeople's heads.'(Hard for data mining)

Richard Liu Chapter 1: Introduction, Exploring Data 22 / 61

Page 23: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. One example

Richard Liu Chapter 1: Introduction, Exploring Data 23 / 61

Page 24: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. General ConceptsData objects, attributes and the relationships amongdat a objects are translated into graphical elementssuch as points, lines, shapes and colors.Note that the representation depends on the type ofattribute(nominal, ordinal, continuous). When thevalue itself has order it is OK to represent them into acoordinate system (with x, y, z axes).You need to preserve some important informationabout relative attributes(such as physical location)People would like to believe that data points that arevisually close to each other have similar values for theirattributes. Richard Liu Chapter 1: Introduction, Exploring Data 24 / 61

Page 25: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Challenge

How to be easily observable? This is what visualization paysattention to most.

Richard Liu Chapter 1: Introduction, Exploring Data 25 / 61

Page 26: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Data Arrangement

Richard Liu Chapter 1: Introduction, Exploring Data 26 / 61

Page 27: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Data Arrangement

Richard Liu Chapter 1: Introduction, Exploring Data 27 / 61

Page 28: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Selection

Sometimes it is extremely hard to show all data objectsand attribute on one graph. So we use selection.Example: Many attributes -> a series oftwo-dimensional plots.

Richard Liu Chapter 1: Introduction, Exploring Data 28 / 61

Page 29: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Visualization Techniques

Stem and Leaf Plots

Splitting values into groups, each containing thosevalues that are the same except for the last digit.Suitable for small values.

Histograms

Displaying the distribution of values for attributes bydividing the possible values into bins and showing thenumber of objects that fall into each bin.

Richard Liu Chapter 1: Introduction, Exploring Data 29 / 61

Page 30: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 30 / 61

Page 31: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 31 / 61

Page 32: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 32 / 61

Page 33: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Visualization Techniques

Box Plot

Show the distribution of the values of a singlenumberical attribute.

Pie Chart

Similar to histogram, typically used for dividingcategorical attributes.Hard for distinguishing. Not preferred in technicalwork.

Richard Liu Chapter 1: Introduction, Exploring Data 33 / 61

Page 34: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 34 / 61

Page 35: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 35 / 61

Page 36: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Visualization Techniques

Scatter PlotWidely used for judging the relation between twoattributes given a series of data objects.

Richard Liu Chapter 1: Introduction, Exploring Data 36 / 61

Page 37: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 37 / 61

Page 38: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Visualization Techniques

Percentile PlotsEmpirical Cumulative Distribution Functions(ECDF)

A function graph.For any given x, it shows the fraction(or, probability) ofthe points that are less than x.

.Notation 2..

......F(x) = P(X ≤ x), where X is a random variable.

Richard Liu Chapter 1: Introduction, Exploring Data 38 / 61

Page 39: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Example

Richard Liu Chapter 1: Introduction, Exploring Data 39 / 61

Page 40: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

..One special case: Demonstrating severalattributes in one graph

Richard Liu Chapter 1: Introduction, Exploring Data 40 / 61

Page 41: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Visualization Techniques (Details Removed)

Contour PlotsSurface PlotsVector Field PlotsLower-Dimensional SlicesAnimation

Richard Liu Chapter 1: Introduction, Exploring Data 41 / 61

Page 42: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Section 5

Visualization for High-DimensionalData

Richard Liu Chapter 1: Introduction, Exploring Data 42 / 61

Page 43: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Visualization Techniques

MatrixCorrelation Matrix: One important step for dataanalysis, in order to capture the features of the wholedata set.A simple kind of Clustering.

Parallel CoordinatesOne coordinate axis for each attribute but the differentaxes are parallel to one other instead of perpendicular.Also, sometimes could be used to ind features.

Richard Liu Chapter 1: Introduction, Exploring Data 43 / 61

Page 44: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 44 / 61

Page 45: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 45 / 61

Page 46: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Section 6

OLAP and Multidimensional DataAnalysis

Richard Liu Chapter 1: Introduction, Exploring Data 46 / 61

Page 47: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Discretizing

Discretizing continuous attribute into categoricalattribute.In this way we could arrange the data within a table ora multidimensional data representation.Cross Tabulation can also be implemented after doingso.

Richard Liu Chapter 1: Introduction, Exploring Data 47 / 61

Page 48: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 48 / 61

Page 49: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 49 / 61

Page 50: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Cross Tabulation

Richard Liu Chapter 1: Introduction, Exploring Data 50 / 61

Page 51: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Aggregation

Aggregation is one of the most general method foranalyzing multidimensional data objects.One example is summation.

Richard Liu Chapter 1: Introduction, Exploring Data 51 / 61

Page 52: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 52 / 61

Page 53: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Examples

Richard Liu Chapter 1: Introduction, Exploring Data 53 / 61

Page 54: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Dimensionality Reduction

In descriptive data analysis, this means trying to reducethe number of attributes shown in the table/graph withaggregation.In regression, this means trying to select fewerresponses in order to diminish the correlation amongthem.Some examples: PCA, SVD.

Richard Liu Chapter 1: Introduction, Exploring Data 54 / 61

Page 55: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. These subjects are ignored

PivotingSlicing and DicingRoll-UpDrill-Down

Richard Liu Chapter 1: Introduction, Exploring Data 55 / 61

Page 56: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Section 7

Supplementary

Richard Liu Chapter 1: Introduction, Exploring Data 56 / 61

Page 57: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Preliminary

Sample SpaceProbabilityDistribution FunctionExpectationVarianceCentral Limit Theorem

Richard Liu Chapter 1: Introduction, Exploring Data 57 / 61

Page 58: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Propositions in Mathematical StatisticsAssume x1, · · · , xn ∼ N(µ, σ2), then.Proposition 1..

......

Let x̄ = 1n∑n

i=1 xi, then

E(x̄) = µ

.Proposition 2..

......

Let s2 = 1n−1

∑ni=1(xi − x̄)2, then

E(s2) = σ2

Richard Liu Chapter 1: Introduction, Exploring Data 58 / 61

Page 59: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Propositions in Mathematical Statistics.Proposition 3..

......

Let f(c) = ∑ni=1(xi − c)2, then

x̄ = argminc

f(c)

.Proposition 4..

......

Let f(c) = ∑ni=1 |xi − c|, then

median(x) = argminc

f(c)

Richard Liu Chapter 1: Introduction, Exploring Data 59 / 61

Page 60: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

.. Python in Numerical Computation

For more information, see:https://weakcha.github.io/WISERCLUB-Final/Experiments.html

Richard Liu Chapter 1: Introduction, Exploring Data 60 / 61

Page 61: Chapter 1: Introduction, Exploring Data

. . . . . .

IntroductionTerminologies Identi ication

Summary StatisticsBasic Visualization

Visualization for High-Dimensional DataOLAP and Multidimensional Data Analysis

Supplementary

Thank you!

Richard Liu Chapter 1: Introduction, Exploring Data 61 / 61