Data visualization in Health related research

90
Data visualization In Health related Researches Debdulal Dutta Roy Indian statistical Institute, Kolkata

Transcript of Data visualization in Health related research

Page 1: Data visualization in Health related research

Data visualization In Health related Researches

Debdulal Dutta Roy Indian statistical Institute, Kolkata

Page 2: Data visualization in Health related research

●The Policy seeks to reach everyone in a comprehensive integrated way

to move towards wellness. It aims at achieving universal health coverage

and delivering quality health care services to all at affordable cost.

●The policy is patient centric and quality driven. It addresses health

security and make in India for drugs and devices.

●The main objective of the National Health Policy 2017 is to achieve the

highest possible level of good health and well-being, through a

preventive and promotive health care orientation in all developmental

policies, and to achieve universal access to good quality health care

services without anyone having to face financial hardship as a

consequence.

Why do you learn Health Data Visualization ?

16.3.2017

Page 3: Data visualization in Health related research

Why do you learn Health Data Visualization ?

●Quality control chart

2. Quality diversity or stratification

3. Quality correlates

4. Quality monitoring

5. Quality prediction

6. Quality Assurance

Page 4: Data visualization in Health related research

Why do you learn Health Data Visualization ?

Efficiency: decreasing costs by avoiding duplicative or

unnecessary diagnostic or therapeutic interventions.

Enhancing Quality:Enhancing the quality of health care

by involving consumers as additional power for quality

assurance, and directing patient streams to the best

quality providers. Evidence based: e-Health interventions

should be evidence based in a sense that their

effectiveness and efficiency should not be assumed but

proven by rigorous evaluation. Empowerment of

consumers and patients: By making the knowledge bases

of medicine and personal electronic records accessible to

consumers over the internet, e-Health opens new avenues

for patient centered medicine and enables evidence based

patient choice.

Health Data

Visualization

Access

Quality

Affordability

Lowering of

Disease

burden

Efficiency

monitoring

Health

surveillance

Page 5: Data visualization in Health related research

Why do you learn Health Data Visualization ?

Encouragement: A new relationship between the patient and health professional, towards a true partnership, where decisions are made in a

shared manner is developed.

Education: The physicians are educated through online resources like medical education and consumers like health education, preventive

information etc.

Extending: The scope of health care is extended beyond its conventional boundaries. It means both in geographical and conceptual sense,

e-Health enables consumers to easily obtain health services online from global providers.

Ethics: e-Health involves new forms of patient-physician interaction and poses new challenges and threats to ethical issues such as online

professional practice, informed consent, privacy and equity issues.

Equity: People who do not have money, skills, and access to computers and networks cannot use computers effectively. As a result, these

patient populations are those who are least likely to benefit from advances in information technology, unless political measures ensure

equitable access for all.

Page 6: Data visualization in Health related research

Why is E-health care important in India ?

India is a vast country with complex socio-economic characteristics that are reflected in its medical systems. These include an insufficient

number of primary care doctors practising in rural and semi-urban areas and an ongoing need to update the knowledge of those who do work

in rural areas.

Qualified doctors’ practice can diverge widely from standards of care, with many medical practitioners lacking formal qualifications altogether.

Out-of-pocket expenditure, which constitutes around 80% of the total healthcare spending in India may be further inflated by travel costs in

both urban and rural areas.

Thus, many conditions remain untreated or are managed with prescription medicines purchased over-the-counter or by faith healers..

Consequently, the 70% of the population that lives in rural areas, in particular, has limited access to adequate health care.

Finally, epidemiological data is often unavailable or non-reliable, hampering the informed design of preventive health programmes.

Page 7: Data visualization in Health related research

Data Visualization in health care by Artificial Intelligence

Artificial intelligence (AI) in healthcare uses algorithms and software to approximate human cognition in

the analysis of complex medical data.

The primary aim of health-related AI applications is to analyze relationships between prevention or

treatment techniques and patient outcomes.

AI programs have been developed and applied to practices such as diagnosis processes, treatment

protocol development, drug development, personalized medicine and patient monitoring and care, among

others.

Page 8: Data visualization in Health related research

Purpose

Ronny Reader

Abby AuthorWendy Writer

Purpose

Health Infographics for Community Awareness

Communicating statistics for Academic

Purpose

Page 9: Data visualization in Health related research

Descriptive Statistics

>mean(x), median(x), mode(x)

>sort(x) # Sorting from Minimum to Maximum

>summary(x)

>Install package

#install.packages("psych")

library(psych)

describe(x)

>sd(x)

>table(x)

>prop.table(x) # Proportion

>ftable(x,y,) # frequency of multiple vectors

Page 10: Data visualization in Health related research

Compare two groups

> t.test(x$Safe_school,x$Delinquency)

>oneway.test(y~y2) # y is metric and y2 is categorical

oneway.test(x$Safe_school~x$Delinquency)

One-way analysis of means (not assuming equal

variances)

data: x$Safe_school and x$Delinquency

F = 73.84, num df = 8.000, denom df = 81.964,

p-value < 2.2e-16

Page 11: Data visualization in Health related research

Measurement Scales

Nominal

Ordinal

Interval

Ratio

Page 12: Data visualization in Health related research

Data visualization Bar plot With SPSS480 archive data were retrieved. Each datum represents

specific psychiatric disorders.

Results show more number of patients suffered from

Depression. And With age there was change in other

psychiatric disorders.

USE SPSS Bar plot for variables> Use mean, Row+sex,

Col=Age

Page 13: Data visualization in Health related research

Depression by age age and sex

Page 14: Data visualization in Health related research

Stacked Bar plot

Page 15: Data visualization in Health related research

Correspondence Analysis using R

Step 1: Download CA

package

Select the Tool option

and enter ca in the

space provided.Click on

the Install button. The

package will be

installed

Page 16: Data visualization in Health related research

Correspondence Analysis using R (contd.)

Step 2: Recalling library to use CA package

> library(ca)

Step 3: Reading the file for analysis (Here

Disorder_Data is the name of the file)

>Disorder_Data<-

read.table(file.choose(),header=T,sep =

",",row.names=1)

You have to choose the path and

select the data and then R will

read the file after that give

command

<view(Disaster_Data)

the data will be displayed.

Page 17: Data visualization in Health related research

Correspondence Analysis using R (contd.)

Since the data selected for analysis has missing

values those need to be eliminated before doing CA

otherwise it will not give any meaningful output

Step 4: Create a new Excel File in the CSV format with

the variables having frequencies in the cell.

After a new file is formed. Repeat the step 3.

Reading the file for analysis (Here DD is the name of

the file)

Page 18: Data visualization in Health related research

Correspondence Analysis using R (contd.)

Step 5: For CA write the following

command and the generate the plot

> ca_analysis<-ca(DD)

> plot(ca_analysis,col=1,cex=.6,xlim=c(-

.225,.375),ylim=c(-4,.4))

> plot(ca_analysis,col=1,cex=.01,xlim=c(-

.225,.375),ylim=c(-.4,.4))

Page 19: Data visualization in Health related research

Correspondence Analysis using R (contd.)

Step 6: To see the summary enter the following

command

> summary(ca_analysis)

Page 20: Data visualization in Health related research

Data visualization In Ordinal Data

Steps for representing ordinal data graphically in excel

1. Enter the data in excel.

2. Rank the data using the rank command.

e.g. =rank(B1,$B1$1:$B1$12)

1. Calculate the median of the ranks

2. Assign negative sign to ranks below the median.

3. Custom sort the ranks from smallest to largest after selecting the data.

4. Select the ‘ranks’ column and go to ‘insert’ tab and then select ‘clustered bar’ from ‘bar

chart’.

Page 21: Data visualization in Health related research

Multiple graphs in single page

Nobody will teach you Data Analytics in Free.

Its cost ranged from 30k to 1 lakh or more

Data Visualization with R (4 graphs, 2 X 2)

>par(mfrow=c(2,2))

>boxplot(x$Safe_school, main="Box-Plot of safe school")

>boxplot(x$Delinquency, main="Delinquency")

> hist(x$Safe_school, main="Histogram of safe school")

> hist(x$Delinquency, main="Histogram of Delinquency")

Page 22: Data visualization in Health related research

Data visualization In Regression

Data Visualization class with R (Regression)

>cor(x,use="complete.obs", method="pearson" )

>plot(x$Safe_school~x$Delinquency,xlab="Less_delinquency",

ylab="Safe_school_perception", main="Relation of Safe school

perception and Delinquency")

>lm(formula = x$Safe_school ~ x$Delinquency)

>abline(26.79,3.20)

Page 23: Data visualization in Health related research

Health diversity

Health diversity is critical factor for developing health literacy.

Besides biological reasons, Age, Gender, Caste, Religion,

Socio-economic status cause diversity in health behaviour.

Page 24: Data visualization in Health related research

Health infographics and Data Visualization in Health Communication

Infographics are visual representations of facts, events, and numbers, and can be used to depict health statistics, risk

assessments, and resources to name a few. The application of visual pattern, illustration, and iconography enhance the way

information is cognitively consumed. Therefore, infographics demand creativity such that they are designed with appeal and

comprehension targeted to a particular audience, are context-specific, and center on relevant themes from a particular

dataset.

1. Easy to remember: visual impact can play an important role in how memorable the information is to the reader.

Ineffective presentation of data can turn off target audiences, resulting in missed messages.

2. Data sets are often difficult for the general public to find. With the click of a mouse, health concepts and statistics can

be shared virally, from any site to the next, on personal pages, or in journalistic formats; where raw data is much

more difficult to interpret, communicate, and embed. Infographics are designed around themes and concepts that

concisely tell a story, eliminating the need for sharers to reinvent.

3. Data visualization differs slightly from infographics: data visualization is all about the numbers, abstracting data sets

into schematics that clarify statistics in the form of graphs, maps or charts. These are usually constructed

scientifically and automatically by software. Where infographics are designed to communicate a story, data

visualization is less holistic and more quantifiable.

Page 25: Data visualization in Health related research

Mental Health Infographics with Pie Chart

Page 26: Data visualization in Health related research
Page 27: Data visualization in Health related research

Mental Health Infographics with Histogram

Page 28: Data visualization in Health related research

Mental Health Infographics with Correspondence map

Page 29: Data visualization in Health related research

Chi-square

> tb1=table(x$AGE.CODE,x$CASTE)

> tb1

1 2 3

1 219 19 2

2 219 21 0

> chisq.test(tb1)

Pearson's Chi-squared test

data: tb1

X-squared = 2.1, df = 2, p-value = 0.3499

Warning message:

In chisq.test(tb1) : Chi-squared approximation may be incorrect

Page 30: Data visualization in Health related research

T-test in r

# independent 2-group t-test

t.test(y~x) # where y is numeric and x is a binary factor

# independent 2-group t-test

t.test(y1,y2) # where y1 and y2 are numeric

# paired t-test

t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric

# one sample t-test

t.test(y,mu=3) # Ho: mu=3

Page 31: Data visualization in Health related research

Non-Parametric Test

# independent 2-group Mann-Whitney U Test

wilcox.test(y~A)

# where y is numeric and A is A binary factor

# independent 2-group Mann-Whitney U Test

wilcox.test(y,x) # where y and x are numeric

# dependent 2-group Wilcoxon Signed Rank Test

wilcox.test(y1,y2,paired=TRUE) # where y1 and y2 are numeric

# Kruskal Wallis Test One Way Anova by Ranks

kruskal.test(y~A) # where y1 is numeric and A is a factor

# Randomized Block Design - Friedman Test

friedman.test(y~A|B)

Page 32: Data visualization in Health related research

Why R?In my data visualisation classes, I often used graphics through SPSS. SPSS is menu driven. Here as researcher, I find little control on the program.

Second graphic output of SPSS is not appealing in the sense it's line thickness.

> x <- c(1,2,3,4,5)

> x

[1] 1 2 3 4 5

> hist x

Here I created one vector or variable named x (small letter) and see the histogram. Sometimes I am confused as which is right ? SPSS or R for same

data. For example, for the same vector SPSS and R provides different graphs.

R programming can be written with more than 7000 libraries. It is regularly updated by data scientists all around the world. And all facilities are open

access. Finally, there are more number of graphics in R. Furthermore R is interactive as if you are talking with machine. Here is one example.

> x=5+4

> x

[1] 9

>

I read Garrett, Guilford. I learnt many formula but when I use SPSS, I feel awkward as it has no utility here as I am acting as machine. I have no

freedom to control. But in R, I can control my analysis.

Page 33: Data visualization in Health related research

How can I install R?

There are two versions. I prefer R studio. Following are the steps:

Go to https://www.rstudio.com/products/rstudio/download/ In ‘Installers for Supported Platforms’ section, choose and click

the R Studio installer based on your operating system.

The download should begin as soon as you click.Click Next..Next..Finish.Download Complete.

To Start R Studio, click on its desktop icon or use ‘search windows’ to access the program. It looks like this:

Page 34: Data visualization in Health related research

Why R Studio?

In data visualisation, it is important to manipulate attributes of the graph. This interactive change is visible through R studio.

RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and

graphics. RStudio was founded by JJ Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief Scientist at

RStudio.

Researchers change attributes in R console and it's results are displayed in graphical output window. Old script is visible in script window

and specific r environment is visible in R environment window. R environment window helps in understanding stored variables. For example :

>Iq<- 100

>Iq1<- 120

>ls ( )

>"Iq" "Iq1"

One can remove the variable by

>rm(Iq)

Page 35: Data visualization in Health related research

How can I transfer file?

I have noticed that r is more friendly to csv file. CSV stands for "comma-separated values". CSV is a simple file format used

to store tabular data, such as a spreadsheet or database. Files in the CSV format can be imported to and exported from

programs that store data in tables, such as Microsoft Excel.

So, my suggestion is to convert your spreadsheet with header into csv file by using save as <file name>.csv

Page 36: Data visualization in Health related research

How can I transfer file? (contd.)Input file

a,b,c

10,20,30

10,20,30

In console, type

my.data = read.table(file.choose(), header=TRUE)

Here file choose command asks you to show the file source.

> my.data

a.b.c

1 10,20,30

2 10,20,30

If you know the source, then write

my.data<-read.csv("C:/Users/DDROY/Desktop/test.csv")

You can remove data file by

>rm(my.data)

Page 37: Data visualization in Health related research

Is interactive mode possible?Interactive refers to two-way flow of information between a computer and a computer-user; responding to a user’s input.

R programming is interactive. Here researchers can communicate withcomputer as if friend. Only thing you have to learn it's own

language.

It understands statement as command when it finds the sign '>'.

For example

>x=10/5

>x

5

Or

>x=c("Kolkata","Delhi")

>x

Kolkata Delhi

So same variable can hold both numeric and alpha numeric.

Page 38: Data visualization in Health related research

Scatter Plot

Scatter plot (also called a scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical

diagram using Cartesian coordinates to display values for typically two variables for a set of data.

Scatter Plot is useful to understand relation of two variables. Usually dataset of explanatory variables are on X-axis and it's

changes on dependent variable are on Y-axis.

Page 39: Data visualization in Health related research

Scatter Plot (contd.)

R is very good for statistics. In the input file, write variable name as header. And in read command, write header = T or TRUE.

> hp<-read.csv("aust.csv",header=T,sep=",")

> hp

Year NSW Vic. Qld SA WA Tas

1 1917 1904 1409 683 44 0 3 6

2 1927 2402 1727 873 56 5 3 92

3 1937 2693 1853 993 58 9 4 57

4 1947 2985 2055 110 6 6 46 502

5 1957 3625 2656 141 3 8 73 688

6 1967 4295 3274 170 0 1 110 87

7 1977 5002 3837 213 0 1 286 12

8 1987 5617 4210 267 5 1 393 14

9 1997 6274 4605 340 1 1 480 17

> plot(NSW~Year, data=hp, pch=16)

Page 40: Data visualization in Health related research

Steps for Simple Regression in RStep 1: Correlation

cor(x,use="complete.obs", method="pearson" )

Step 2: Plot

> plot(x$Safe_school~x$Delinquency)

Step 3: Run Regression between Safe school and

Delinquency

> lm(formula = x$Safe_school ~ x$Delinquency)

This will generate Intercept and Coefficients

Step 4: After getting Intercept and Coefficients use the

following command to generate the plot

> abline(Intercept,Coefficient)

Page 41: Data visualization in Health related research

Scatter Plot using SPSS

Step 1: Graphs-----> Legacy Dialogs----> Scatter/dot-----> Simple

Step 2: Select the Safe_total variable and move it to Y-axis and

School attendance motivation to X-axis

Step 3: Select Caste Code and move it to Panel (Row box)

Step 4: Click Ok

Syntax

GRAPH

/SCATTERPLOT(BIVAR)=SAM_total WITH safe_total

/PANEL ROWVAR=Caste_code ROWOP=NEST

/MISSING=LISTWISE.

Page 42: Data visualization in Health related research

HistogramA histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous univariate

data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.

Initially number of scores in each class interval or bin is counted. Later on plot is prepared. Plot shows binwise frequency of scores.

Histogram plot can be drawn in R with following arguments.

hist(x$v1, main="Anxiety", xlab=" Level", ylab="Number of cases", border ="blue")

Here

x$v1 = data of v1.

main=Graph name

xlab=name of x axis

Border =blue colour of histogram

>hist(x$V1)

>hist(x$V1, main="my histogram")

>hist(x$V1, main="my histogram", xlab="Anxiety")

>hist(x$V1, main="my histogram", xlab="Anxiety", ylab="frequency ")

Page 43: Data visualization in Health related research

Random Number

We have read the normal distribution where in mean=0. We have read it but we have not seen it. Today, I will show you that

data with Mean=0, And SD=1. But keep in mind ideal condition is Mean=0. This is ideal and it can not be found. Therefore

distribution will be close to 0.

This is the example:

> x=rnorm(10,0,1) # 10 is number of data, 0 is Mean, 1 is SD.

> x

[1] -0.8161694 1.4684068 0.2120832 -2.2949087 -0.4389617 1.3511973

[7] -0.9904338 -1.8606424 0.3877817 0.5777760

> mean(x)

[1] -0.2403871

Page 44: Data visualization in Health related research

Random Number

> sd(x)

[1] 1.269368

> x=rnorm(10,0,1)

> x

[1] -1.4317261 0.6816895 -2.8454675 -0.6584917 1.1255525 -1.7652789

[7] -0.7825077 -0.2994345 0.2827423 -0.8911193

> mean(x)

[1] -0.6584041

> sd(x)

[1] 1.18645

>

Page 45: Data visualization in Health related research

Reading file from any directory

R wants to read data from directory. Therefore, we initially change the directory or use separate arguments. Here is another

argument in which researcher will show the source directory.

>my.data = read.table(file.choose(), header=TRUE)

check the data by this command

>str(my.data)

Page 46: Data visualization in Health related research

Bar Plot and Histogram

Histograms are used to show distributions of variables while bar charts are used to compare variables. Histograms plot

quantitative data with ranges of the data grouped into bins or intervals while bar charts plot categorical data

x<-read.csv(file.choose(),header=TRUE)

barplot(x$Anxiety)

> x$Anxiety

[1] 42 32 19 42 30 5 11 8 0 5 3 2

Page 47: Data visualization in Health related research

Color Bar Plot

>barplot(x$anxiety,col="blue", horiz=T, main="health data ", xlab="year")

Page 48: Data visualization in Health related research

Line Chart

Monthwise anxiety scores

>v=c(7,12,28,3,41)

>plot(v,type="o",col="red",xlab="Month", main="Anxiety scores over months")

Two line charts

t=c(14,7,6,19,3)

> lines(t,type="o",col="blue")

Page 49: Data visualization in Health related research

Steam-Leaf Plot (Confusion.. donot copy.. Rather test and show..)A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit).

Like in this example:

1 2345

2 2345678

3 12345678912345678

4 234

5 34

53 is divided into two 5 is stem and 3 is leaf

Like wise 54

When it is decimal, value after decimal is leaf. For example 2.5,2.6,2.7,1.5,1.9

1 5 9

2 5 6 7

Here is a data set.

Page 50: Data visualization in Health related research

Steam-Leaf Plot (Confusion.. donot copy.. Rather test and show..)

> x=c(10,12,22,25,34,22,23)

> stem(x)

The decimal point is 1 digit(s) to the right of the |

1 | 02

1 |

2 | 223

2 | 5

3 | 4

But this result is not meaningful

It should be

1 02

2 2235

3. 4

Page 51: Data visualization in Health related research

Boxplot

Boxplot is a simple way of representing statistical data on a plot in which a rectangle is drawn to represent the second and

third quartiles, usually with a vertical line inside to indicate the median value. The lower and upper quartiles are shown as

horizontal lines either side of the rectangle.

>boxplot(x, main="Depression level, High=less depressed", ylab="Scores", xlab="Fig.1. Few people are still depressed")

> boxplot(x, main="Depression level, High=less depressed", ylab="Scores", xlab="Fig.1. Few people are still depressed",

col="darkgreen")

Page 52: Data visualization in Health related research

Boxplot (Contd.)

I presented boxplot of single variable. Here is the distribution of more variables.

data are entered into x. So the command is :

> boxplot(x)

Since all variables are not in same scale the distribution is not clear. Therefore, always keep all the variables are on same

scale.

Here is the data.frame:

> str(x) # command for data structure

'data.frame': 9 obs. of 7 variables:

Page 53: Data visualization in Health related research

Boxplot (Contd.)

$ Year: int 1917 1927 1937 1947 1957 1967 1977 1987 1997

$ NSW : int 1904 2402 2693 2985 3625 4295 5002 5617 6274

$ Vic.: int 1409 1727 1853 2055 2656 3274 3837 4210 4605

$ Qld : int 683 873 993 110 141 170 213 267 340

$ SA : Factor w/ 8 levels "0 1","1 1","3 8",..: 4 6 7 8 3 1 1 5 2

$ WA : Factor w/ 9 levels "0 3","110","286",..: 1 7 9 5 8 2 3 4 6

$ Tas : int 6 92 57 502 688 87 12 14 17

>

Page 54: Data visualization in Health related research

Boxplot (Contd.)

In earlier class lectures, You have understood importance of box-whisker plot developed by John Tucky. I presented boxplot

of single and multiple variables. In case of multiple variables, I have told about similar scaling of all variables. In this lecture.

I will show you outlier. Outlier affects the central tendency. Outlier may happen for typing mistake or it may be considered

as real data. Outlier disturbs the correlation coefficients. Therefore it is important to examine existence of outlier.

Page 55: Data visualization in Health related research

Observation LocationIn case of Uni plot, three things are important.

A. location

B. dispersion

C. distribution

Here is one location of observation

A. Enumerative plots, in which all observations are shown, have the advantage of not losing any specific information–the values of the

individual observations can be retrieved from the plot. The disadvantage of such plots arises when there are a large number of observations–

it may be difficult to get an overall view of the properties of a variable. Enumerative plots do a fairly good job of displaying the location,

dispersion and distribution of a variable, but may not allow a clear comparison of variables, one to another.

Page 56: Data visualization in Health related research

Observation Location (Contd.)

Command

> plot(x$RPM)

B. > hist(x$RPM)

C. > stem(x$RPM)

The decimal point is 1 digit(s) to the right of the |

3 | 3

3 | 57

4 | 11122223

4 | 555556667777888899999

5 | 000000122222333333334444

5 | 555555555555556666666666777777888888999

6 | 00

Page 57: Data visualization in Health related research

Data Frame

A data frame is used for creating table. Here is a data from different sources and will be merged in single data frame.

Sources

>a=c(2,3,4,5) # storing numeric.

>b=c("Delhi","Kolkata","Chennai","Mumbai") #storing non-numeric.

In above numeric data are stored in c and non-numeric are stored in b.

Command data.frame is used to form new data table. This is stored in d.

>d=data.frame(a,b)

Both a and b data are stored in d as array.

>d

New data table will be displayed.

>head(d) # a and b are hearers.

>d=data.frame(b,a)

> d

Page 58: Data visualization in Health related research

Data Frame (Contd.)

b a

1 Delhi 2

2 Kolkata 3

3 Chennai 4

4 Mumbai 5

> d[,1]

[1] Delhi Kolkata Chennai Mumbai

Levels: Chennai Delhi Kolkata Mumbai

Page 59: Data visualization in Health related research

Data Frame (Contd.)>

names(d)

EXTENDING DATA FRAME

[1] "a" "b"

> f=c("chocolate","cocacola","orange","apple") # new variable f is added

> d=data.frame(b,a,f)

> d

b a f

1 Delhi 2 chocolate

2 Kolkata 3 cocacola

3 Chennai 4 orange

4 Mumbai 5 apple

>pie(a,b)

Page 60: Data visualization in Health related research

Class Interval

>score=c(10,15,10,20,20,25,30,20,30,32,40,45,48,50)

> stem(score)

The decimal point is 1 digit(s) to the right of the |

1 | 005

2 | 0005

3 | 002

4 | 058

5 | 0

> summary(score)

Min. 1st Qu. Median Mean 3rd Qu. Max.

10.00 20.00 27.50 28.21 38.00 50.00

Page 61: Data visualization in Health related research

Class Interval (contd.)

> table(score)

Score

10 15 20 25 30 32 40 45 48 50

2 1 3 1 2 1 1 1 1 1

> bins=seq(2,50,by=2)

> bins

[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Page 62: Data visualization in Health related research

Class Interval (contd.)>

> v1.cut=cut(v1,bins,right=F)

> v1freq=table(v1.cut)

> v1freq

v1.cut

[2,4) [4,6) [6,8) [8,10) [10,12) [12,14) [14,16) [16,18) [18,20) [20,22)

7 20 40 24 6 0 0 0 0 0

[22,24) [24,26) [26,28) [28,30) [30,32) [32,34) [34,36) [36,38) [38,40) [40,42)

0 0 0 0 0 0 0 0 0 0

[42,44) [44,46) [46,48) [48,50)

0 0 0 0

>

Page 63: Data visualization in Health related research

Naming and Name Calling

When we are born. Our parents give name. And we are getting one title indicating our heredity or root. Like wise we are giving names to

variables.

Example : Five people initialized with Deb.

>Deb=c("debi", "debjyoti", "debbani", "debasree")

One can not call Debi as d is small.

Another thing is that you have to call debi with title. So

>Deb$debi

>Deb$debjyoti

Page 64: Data visualization in Health related research

Array use (storing,retrieving,transforming, frequency table, proportion and histogram)

Arrays are the R data objects which can store data in more than two dimensions.

> x=c(1,2,3)

> y=x+2

> y

[1] 3 4 5

Read the data from file and store them

>x=read.csv(file.choose(),header=T)

> str(x)

'data.frame': 82 obs. of 34 variables:

$ SL.NO : int 1 2 3 4 5 6 7 8 9 10 ...

$ NAME : Factor w/ 82 levels " DILIP KUMAR ROY",..: 26 37 15 59 34 50 79 67 68 24 ...

$ area : Factor w/ 2 levels "BANSBARI","BHUYANPARA": 2 2 2 2 2 2 2 2 2 2 ...

Page 65: Data visualization in Health related research

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.

$ area_code: int 2 2 2 2 2 2 2 2 2 2 ...

$ J1 : int 3 3 1 3 4 5 3 5 4 4 ...

$ J2 : int 4 4 3 4 4 4 4 4 4 4 ...

$ J3 : int 4 4 0 4 4 4 3 5 4 4 ...

$ J4 : int 2 3 0 3 4 4 3 NA 3 4 ...

$ J5 : int 3 3 0 4 3 3 0 4 4 3 ...

$ J6 : int 4 4 2 4 4 5 4 4 4 3 ...

$ J7 : int 4 4 0 4 3 4 4 4 4 4 ...

$ J8 : int 4 4 0 4 0 0 4 0 0 0 ...

Page 66: Data visualization in Health related research

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.

$ J9 : int 0 0 0 1 0 0 0 0 0 0 ...

$ J10 : int 5 5 NA 5 0 0 5 5 4 5 ...

$ J11 : int 2 NA 4 3 3 3 3 4 3 3 ...

$ J12 : int 3 1 NA 3 2 3 4 5 4 4 ...

$ J13 : int 3 2 2 3 3 3 0 4 4 3 ...

$ J14 : int 1 0 0 3 1 4 1 4 3 4 ...

$ J15 : int 3 2 0 3 3 3 2 3 3 3 ...

$ J16 : int 0 0 0 0 0 0 0 0 0 0 ...

$ J17 : int 3 0 0 3 0 0 3 0 0 0 ...

$ J18 : int 4 2 4 3 4 4 4 4 4 4 ...

Page 67: Data visualization in Health related research

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.

$ J19 : int 4 3 0 3 0 0 1 4 0 0 ...

$ J20 : int 0 0 0 1 0 0 0 4 NA 0 ...

$ J21 : int 3 1 0 3 3 4 0 NA 2 3 ...

$ J22 : int 1 1 0 4 2 2 0 4 3 4 ...

$ J23 : int 2 3 0 2 4 4 0 5 4 4 ...

$ J24 : int 2 3 1 1 3 3 1 4 3 3 ...

$ J25 : int 4 3 0 3 0 0 0 3 3 3 ...

$ J26 : int 2 2 0 3 3 3 0 3 3 2 ...

$ J27 : int 5 4 0 4 3 0 4 4 3 4 ...

$ J28 : int 4 4 3 4 0 0 4 4 0 3 ...

$ J29 : int 4 3 0 4 4 4 4 5 4 4 ...

$ J30 : int 4 4 4 4 3 2 4 4 4 4 ...

Page 68: Data visualization in Health related research

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.> y=table(x$J1)

> y

0 1 2 3 4 5

1 11 4 37 23 5

> (y/sum(y)*100)

0 1 2 3 4

1.234568 13.580247 4.938272 45.679012 28.395062

5

6.172840

Page 69: Data visualization in Health related research

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.

>

> z=(y/sum(y)*100)

> z

0 1 2 3 4

1.234568 13.580247 4.938272 45.679012 28.395062

5

6.172840

> barplot(z)

Page 70: Data visualization in Health related research

Change the directory> dir () # locate directory

>hpl<-read.csv("RIASEC.csv",header=T,sep=",") #reading csv file

>hpl<-read.csv("RIASEC.csv")

>Str(hpl) # structure the object

>hpl<-read.table(file.choose(), header=T, sep=“,”) # Choosing file

>hpl

>hpl<-read.delim(file.choose(),header=T) # reading text file #Choosingdelimited text file

Page 71: Data visualization in Health related research

Change the directory (Contd.)>names(hpl) #naming vectors

>hpl[1,1] # first row and first col

>hpl[, 4]

>hpl$Age

>mean(Age) #average

>sd(Age) # standard deviation

>sd(Age)/mean(Age)*100 # Coefficient of variation

>hist(Age) # histogram

>hist( Age, breaks=20) #20 bins

>plot(Age ~ y) # scatterplot

>stem(Age) # Stem leaf plot

Page 72: Data visualization in Health related research

Health is physical, mental and social well-being not the absence of disease and infirmity.

Page 73: Data visualization in Health related research

Data Visualization

Data is plural, datum is singular.

Plural verb should be used in Data.

Data are not necessarily numeric, it can be text,

audio, picture. S

Page 74: Data visualization in Health related research

Health is directed continuum. It can be assessed with both Metric and Non-Metric Measurement scales and Uni,Bi and Multivariate Data visualization tools.

Page 75: Data visualization in Health related research

Data visualization

Data visualization is a general term that describes any effort to help people understand the significance of

data and to communicate by placing it in a visual context.

It is important in health psychological research to describe the health condition, it's determinants and

promotion.

Data provide information about psychological and behavioral processes in health, illness, and healthcare.

Besides, Visual data help understanding how psychological, behavioral, and cultural factors contribute to

physical health and illness.

Page 76: Data visualization in Health related research

History

Historically, data visualization has evolved through the work of noted practitioners. The founder of graphical methods in

statistics is William Playfair. William Playfair invented four types of graphs:

● the line graph,

● the bar chart of economic data ,

● the pie chart and t

● he circle graph.

Page 77: Data visualization in Health related research

History -2

Joseph Priestly had created the innovation of the first timeline charts, in which individual bars were used to visualize the life span of a person (1765).

That’s right timelines were invented 250 years.

Among the most famous early data visualizations is Napoleon’s March as depicted by Charles Minard. The data visualization packs in extensive

information on the effect of temperature on Napoleon’s invasion of Russia along with time scales. The graphic is notable for its representation in two

dimensions of six types of data: the number of Napoleon’s troops; distance; temperature; the latitude and longitude; direction of travel; and location

relative to specific dates

Florence Nightangle was also a pioneer in data visulaization. She drew coxcomb charts for depicting effect of disease on troop mortality (1858).

The use of maps in graphs or spatial analytics was pioneered by John Snow ( not from the Game of Thrones!). It was map of deaths from a cholera

outbreak in London, 1854, in relation to the locations of public water pumps and it helped pinpoint the outbreak to a single pump.

Page 78: Data visualization in Health related research

Process

In health psychological research and care, data visualization is important for acquiring, storing, retrieving and using of health care

information to foster better collaboration among various healthcare providers.

Disease pattern is changing in structure and process. Data visualization can gauge this change by clustering symptoms over periods.

Health infographics can be used for community service. Besides data visualization helps in health survelliance. By analysis of social

media post outbreak and geographical span of any depression can be easily identified. Accordingly survelliance system provider can

take measures to stop it.

Page 79: Data visualization in Health related research

plot (Year ~ NSW, data=x, pch=16)

Page 80: Data visualization in Health related research

Uni, Bi and Multivariate data visualization

tools are useful for understanding health status

and health related associations.

Page 81: Data visualization in Health related research

Univariate Statistics

Page 82: Data visualization in Health related research
Page 83: Data visualization in Health related research

Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other words data has only one variable.

A variable in univariate analysis is just a condition or subset that your data falls into. You can think of it as a “category.” For example, the

analysis might look at a variable of “age” or it might look at “height” or “weight”. However, it doesn’t look at more than one variable at a time

otherwise it becomes bivariate analysis (or in the case of 3 or more variables it would be called multivariate analysis).

Page 84: Data visualization in Health related research

MaterialsFound around the house!

2 drinking glasses

Table salt

2 eggs

Water

Page 85: Data visualization in Health related research

Procedure

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna

aliqua

Incididunt ut labore et dolore

Consectetur adipiscing elit, sed do

eiusmod tempor incididunt ut labore

et dolore magna aliqua

Incididunt ut labore et dolore

Page 86: Data visualization in Health related research

Hypothesis

Page 87: Data visualization in Health related research

Tell the audience what you expect to happen...

Page 88: Data visualization in Health related research

I think this is what’s going to happen

because…

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip.

Variables that may affect the

outcome...

Lorem ipsum dolor sit amet, consectetur

adipiscing elit

Sed do eiusmod tempor incididunt ut

labore et dolore magna aliqua

Hypothesis support

Page 89: Data visualization in Health related research

The experiment

Page 90: Data visualization in Health related research

Conclusion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad

minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.