Data visualization in Health related research

Data visualization In Health related Researches

Debdulal Dutta Roy Indian statistical Institute, Kolkata

●The Policy seeks to reach everyone in a comprehensive integrated way

to move towards wellness. It aims at achieving universal health coverage

and delivering quality health care services to all at affordable cost.

●The policy is patient centric and quality driven. It addresses health

security and make in India for drugs and devices.

●The main objective of the National Health Policy 2017 is to achieve the

highest possible level of good health and well-being, through a

preventive and promotive health care orientation in all developmental

policies, and to achieve universal access to good quality health care

services without anyone having to face financial hardship as a

consequence.

Why do you learn Health Data Visualization ?

16.3.2017


●Quality control chart

2. Quality diversity or stratification

3. Quality correlates

4. Quality monitoring

5. Quality prediction

6. Quality Assurance


Efficiency: decreasing costs by avoiding duplicative or

unnecessary diagnostic or therapeutic interventions.

Enhancing Quality:Enhancing the quality of health care

by involving consumers as additional power for quality

assurance, and directing patient streams to the best

quality providers. Evidence based: e-Health interventions

should be evidence based in a sense that their

effectiveness and efficiency should not be assumed but

proven by rigorous evaluation. Empowerment of

consumers and patients: By making the knowledge bases

of medicine and personal electronic records accessible to

consumers over the internet, e-Health opens new avenues

for patient centered medicine and enables evidence based

patient choice.

Health Data

Visualization

Access

Quality

Affordability

Lowering of

Disease

burden

Efficiency

monitoring

Health

surveillance


Encouragement: A new relationship between the patient and health professional, towards a true partnership, where decisions are made in a

shared manner is developed.

Education: The physicians are educated through online resources like medical education and consumers like health education, preventive

information etc.

Extending: The scope of health care is extended beyond its conventional boundaries. It means both in geographical and conceptual sense,

e-Health enables consumers to easily obtain health services online from global providers.

Ethics: e-Health involves new forms of patient-physician interaction and poses new challenges and threats to ethical issues such as online

professional practice, informed consent, privacy and equity issues.

Equity: People who do not have money, skills, and access to computers and networks cannot use computers effectively. As a result, these

patient populations are those who are least likely to benefit from advances in information technology, unless political measures ensure

equitable access for all.

Why is E-health care important in India ?

India is a vast country with complex socio-economic characteristics that are reflected in its medical systems. These include an insufficient

number of primary care doctors practising in rural and semi-urban areas and an ongoing need to update the knowledge of those who do work

in rural areas.

Qualified doctors’ practice can diverge widely from standards of care, with many medical practitioners lacking formal qualifications altogether.

Out-of-pocket expenditure, which constitutes around 80% of the total healthcare spending in India may be further inflated by travel costs in

both urban and rural areas.

Thus, many conditions remain untreated or are managed with prescription medicines purchased over-the-counter or by faith healers..

Consequently, the 70% of the population that lives in rural areas, in particular, has limited access to adequate health care.

Finally, epidemiological data is often unavailable or non-reliable, hampering the informed design of preventive health programmes.

Data Visualization in health care by Artificial Intelligence

Artificial intelligence (AI) in healthcare uses algorithms and software to approximate human cognition in

the analysis of complex medical data.

The primary aim of health-related AI applications is to analyze relationships between prevention or

treatment techniques and patient outcomes.

AI programs have been developed and applied to practices such as diagnosis processes, treatment

protocol development, drug development, personalized medicine and patient monitoring and care, among

others.

Purpose

Ronny Reader

Abby AuthorWendy Writer

Purpose

Health Infographics for Community Awareness

Communicating statistics for Academic

Purpose

Descriptive Statistics

>mean(x), median(x), mode(x)

>sort(x) # Sorting from Minimum to Maximum

>summary(x)

>Install package

#install.packages("psych")

library(psych)

describe(x)

>sd(x)

>table(x)

>prop.table(x) # Proportion

>ftable(x,y,) # frequency of multiple vectors

Compare two groups

> t.test(x$Safe_school,x$Delinquency)

>oneway.test(y~y2) # y is metric and y2 is categorical

oneway.test(x$Safe_school~x$Delinquency)

One-way analysis of means (not assuming equal

variances)

data: x$Safe_school and x$Delinquency

F = 73.84, num df = 8.000, denom df = 81.964,

p-value < 2.2e-16

Measurement Scales

Nominal

Ordinal

Interval

Ratio

Data visualization Bar plot With SPSS480 archive data were retrieved. Each datum represents

specific psychiatric disorders.

Results show more number of patients suffered from

Depression. And With age there was change in other

psychiatric disorders.

USE SPSS Bar plot for variables> Use mean, Row+sex,

Col=Age

Depression by age age and sex

Stacked Bar plot

Correspondence Analysis using R

Step 1: Download CA

package

Select the Tool option

and enter ca in the

space provided.Click on

the Install button. The

package will be

installed

Correspondence Analysis using R (contd.)

Step 2: Recalling library to use CA package

> library(ca)

Step 3: Reading the file for analysis (Here

Disorder_Data is the name of the file)

>Disorder_Data<-

read.table(file.choose(),header=T,sep =

",",row.names=1)

You have to choose the path and

select the data and then R will

read the file after that give

command

<view(Disaster_Data)

the data will be displayed.


Since the data selected for analysis has missing

values those need to be eliminated before doing CA

otherwise it will not give any meaningful output

Step 4: Create a new Excel File in the CSV format with

the variables having frequencies in the cell.

After a new file is formed. Repeat the step 3.

Reading the file for analysis (Here DD is the name of

the file)


Step 5: For CA write the following

command and the generate the plot

> ca_analysis<-ca(DD)

> plot(ca_analysis,col=1,cex=.6,xlim=c(-

.225,.375),ylim=c(-4,.4))

> plot(ca_analysis,col=1,cex=.01,xlim=c(-

.225,.375),ylim=c(-.4,.4))


Step 6: To see the summary enter the following

command

> summary(ca_analysis)

Data visualization In Ordinal Data

Steps for representing ordinal data graphically in excel

1. Enter the data in excel.

2. Rank the data using the rank command.

e.g. =rank(B1,$B1$1:$B1$12)

1. Calculate the median of the ranks

2. Assign negative sign to ranks below the median.

3. Custom sort the ranks from smallest to largest after selecting the data.

4. Select the ‘ranks’ column and go to ‘insert’ tab and then select ‘clustered bar’ from ‘bar

chart’.

Multiple graphs in single page

Nobody will teach you Data Analytics in Free.

Its cost ranged from 30k to 1 lakh or more

Data Visualization with R (4 graphs, 2 X 2)

>par(mfrow=c(2,2))

>boxplot(x$Safe_school, main="Box-Plot of safe school")

>boxplot(x$Delinquency, main="Delinquency")

> hist(x$Safe_school, main="Histogram of safe school")

> hist(x$Delinquency, main="Histogram of Delinquency")

Data visualization In Regression

Data Visualization class with R (Regression)

>cor(x,use="complete.obs", method="pearson" )

>plot(x$Safe_school~x$Delinquency,xlab="Less_delinquency",

ylab="Safe_school_perception", main="Relation of Safe school

perception and Delinquency")

>lm(formula = x$Safe_school ~ x$Delinquency)

>abline(26.79,3.20)

Health diversity

Health diversity is critical factor for developing health literacy.

Besides biological reasons, Age, Gender, Caste, Religion,

Socio-economic status cause diversity in health behaviour.

Health infographics and Data Visualization in Health Communication

Infographics are visual representations of facts, events, and numbers, and can be used to depict health statistics, risk

assessments, and resources to name a few. The application of visual pattern, illustration, and iconography enhance the way

information is cognitively consumed. Therefore, infographics demand creativity such that they are designed with appeal and

comprehension targeted to a particular audience, are context-specific, and center on relevant themes from a particular

dataset.

1. Easy to remember: visual impact can play an important role in how memorable the information is to the reader.

Ineffective presentation of data can turn off target audiences, resulting in missed messages.

2. Data sets are often difficult for the general public to find. With the click of a mouse, health concepts and statistics can

be shared virally, from any site to the next, on personal pages, or in journalistic formats; where raw data is much

more difficult to interpret, communicate, and embed. Infographics are designed around themes and concepts that

concisely tell a story, eliminating the need for sharers to reinvent.

3. Data visualization differs slightly from infographics: data visualization is all about the numbers, abstracting data sets

into schematics that clarify statistics in the form of graphs, maps or charts. These are usually constructed

scientifically and automatically by software. Where infographics are designed to communicate a story, data

visualization is less holistic and more quantifiable.

Mental Health Infographics with Pie Chart

Mental Health Infographics with Histogram

Mental Health Infographics with Correspondence map

Chi-square

> tb1=table(x$AGE.CODE,x$CASTE)

> tb1

1 2 3

1 219 19 2

2 219 21 0

> chisq.test(tb1)

Pearson's Chi-squared test

data: tb1

X-squared = 2.1, df = 2, p-value = 0.3499

Warning message:

In chisq.test(tb1) : Chi-squared approximation may be incorrect

T-test in r

# independent 2-group t-test

t.test(y~x) # where y is numeric and x is a binary factor

# independent 2-group t-test

t.test(y1,y2) # where y1 and y2 are numeric

# paired t-test

t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric

# one sample t-test

t.test(y,mu=3) # Ho: mu=3

Non-Parametric Test

# independent 2-group Mann-Whitney U Test

wilcox.test(y~A)

# where y is numeric and A is A binary factor

# independent 2-group Mann-Whitney U Test

wilcox.test(y,x) # where y and x are numeric

# dependent 2-group Wilcoxon Signed Rank Test

wilcox.test(y1,y2,paired=TRUE) # where y1 and y2 are numeric

# Kruskal Wallis Test One Way Anova by Ranks

kruskal.test(y~A) # where y1 is numeric and A is a factor

# Randomized Block Design - Friedman Test

friedman.test(y~A|B)

Why R?In my data visualisation classes, I often used graphics through SPSS. SPSS is menu driven. Here as researcher, I find little control on the program.

Second graphic output of SPSS is not appealing in the sense it's line thickness.

> x <- c(1,2,3,4,5)

> x

[1] 1 2 3 4 5

> hist x

Here I created one vector or variable named x (small letter) and see the histogram. Sometimes I am confused as which is right ? SPSS or R for same

data. For example, for the same vector SPSS and R provides different graphs.

R programming can be written with more than 7000 libraries. It is regularly updated by data scientists all around the world. And all facilities are open

access. Finally, there are more number of graphics in R. Furthermore R is interactive as if you are talking with machine. Here is one example.

> x=5+4

> x

[1] 9

>

I read Garrett, Guilford. I learnt many formula but when I use SPSS, I feel awkward as it has no utility here as I am acting as machine. I have no

freedom to control. But in R, I can control my analysis.

How can I install R?

There are two versions. I prefer R studio. Following are the steps:

Go to https://www.rstudio.com/products/rstudio/download/ In ‘Installers for Supported Platforms’ section, choose and click

the R Studio installer based on your operating system.

The download should begin as soon as you click.Click Next..Next..Finish.Download Complete.

To Start R Studio, click on its desktop icon or use ‘search windows’ to access the program. It looks like this:

https://www.rstudio.com/products/rstudio/download/

Why R Studio?

In data visualisation, it is important to manipulate attributes of the graph. This interactive change is visible through R studio.

RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and

graphics. RStudio was founded by JJ Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief Scientist at

RStudio.

Researchers change attributes in R console and it's results are displayed in graphical output window. Old script is visible in script window

and specific r environment is visible in R environment window. R environment window helps in understanding stored variables. For example :

>Iq<- 100

>Iq1<- 120

>ls ( )

>"Iq" "Iq1"

One can remove the variable by

>rm(Iq)

How can I transfer file?

I have noticed that r is more friendly to csv file. CSV stands for "comma-separated values". CSV is a simple file format used

to store tabular data, such as a spreadsheet or database. Files in the CSV format can be imported to and exported from

programs that store data in tables, such as Microsoft Excel.

So, my suggestion is to convert your spreadsheet with header into csv file by using save as <file name>.csv

How can I transfer file? (contd.)Input file

a,b,c

10,20,30

10,20,30

In console, type

my.data = read.table(file.choose(), header=TRUE)

Here file choose command asks you to show the file source.

> my.data

a.b.c

1 10,20,30

2 10,20,30

If you know the source, then write

my.data<-read.csv("C:/Users/DDROY/Desktop/test.csv")

You can remove data file by

>rm(my.data)

Is interactive mode possible?Interactive refers to two-way flow of information between a computer and a computer-user; responding to a user’s input.

R programming is interactive. Here researchers can communicate withcomputer as if friend. Only thing you have to learn it's own

language.

It understands statement as command when it finds the sign '>'.

For example

>x=10/5

>x

5

Or

>x=c("Kolkata","Delhi")

>x

Kolkata Delhi

So same variable can hold both numeric and alpha numeric.

Scatter Plot

Scatter plot (also called a scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical

diagram using Cartesian coordinates to display values for typically two variables for a set of data.

Scatter Plot is useful to understand relation of two variables. Usually dataset of explanatory variables are on X-axis and it's

changes on dependent variable are on Y-axis.

Scatter Plot (contd.)

R is very good for statistics. In the input file, write variable name as header. And in read command, write header = T or TRUE.

> hp<-read.csv("aust.csv",header=T,sep=",")

> hp

Year NSW Vic. Qld SA WA Tas

1 1917 1904 1409 683 44 0 3 6

2 1927 2402 1727 873 56 5 3 92

3 1937 2693 1853 993 58 9 4 57

4 1947 2985 2055 110 6 6 46 502

5 1957 3625 2656 141 3 8 73 688

6 1967 4295 3274 170 0 1 110 87

7 1977 5002 3837 213 0 1 286 12

8 1987 5617 4210 267 5 1 393 14

9 1997 6274 4605 340 1 1 480 17

> plot(NSW~Year, data=hp, pch=16)

Steps for Simple Regression in RStep 1: Correlation

cor(x,use="complete.obs", method="pearson" )

Step 2: Plot

> plot(x$Safe_school~x$Delinquency)

Step 3: Run Regression between Safe school and

Delinquency

> lm(formula = x$Safe_school ~ x$Delinquency)

This will generate Intercept and Coefficients

Step 4: After getting Intercept and Coefficients use the

following command to generate the plot

> abline(Intercept,Coefficient)

Scatter Plot using SPSS

Step 1: Graphs-----> Legacy Dialogs----> Scatter/dot-----> Simple

Step 2: Select the Safe_total variable and move it to Y-axis and

School attendance motivation to X-axis

Step 3: Select Caste Code and move it to Panel (Row box)

Step 4: Click Ok

Syntax

GRAPH

/SCATTERPLOT(BIVAR)=SAM_total WITH safe_total

/PANEL ROWVAR=Caste_code ROWOP=NEST

/MISSING=LISTWISE.

HistogramA histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous univariate

data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.

Initially number of scores in each class interval or bin is counted. Later on plot is prepared. Plot shows binwise frequency of scores.

Histogram plot can be drawn in R with following arguments.

hist(x$v1, main="Anxiety", xlab=" Level", ylab="Number of cases", border ="blue")

Here

x$v1 = data of v1.

main=Graph name

xlab=name of x axis

Border =blue colour of histogram

>hist(x$V1)

>hist(x$V1, main="my histogram")

>hist(x$V1, main="my histogram", xlab="Anxiety")

>hist(x$V1, main="my histogram", xlab="Anxiety", ylab="frequency ")

Random Number

We have read the normal distribution where in mean=0. We have read it but we have not seen it. Today, I will show you that

data with Mean=0, And SD=1. But keep in mind ideal condition is Mean=0. This is ideal and it can not be found. Therefore

distribution will be close to 0.

This is the example:

> x=rnorm(10,0,1) # 10 is number of data, 0 is Mean, 1 is SD.

> x

[1] -0.8161694 1.4684068 0.2120832 -2.2949087 -0.4389617 1.3511973

[7] -0.9904338 -1.8606424 0.3877817 0.5777760

> mean(x)

[1] -0.2403871

Random Number

> sd(x)

[1] 1.269368

> x=rnorm(10,0,1)

> x

[1] -1.4317261 0.6816895 -2.8454675 -0.6584917 1.1255525 -1.7652789

[7] -0.7825077 -0.2994345 0.2827423 -0.8911193

> mean(x)

[1] -0.6584041

> sd(x)

[1] 1.18645

>

Reading file from any directory

R wants to read data from directory. Therefore, we initially change the directory or use separate arguments. Here is another

argument in which researcher will show the source directory.

>my.data = read.table(file.choose(), header=TRUE)

check the data by this command

>str(my.data)

Bar Plot and Histogram

Histograms are used to show distributions of variables while bar charts are used to compare variables. Histograms plot

quantitative data with ranges of the data grouped into bins or intervals while bar charts plot categorical data

x<-read.csv(file.choose(),header=TRUE)

barplot(x$Anxiety)

> x$Anxiety

[1] 42 32 19 42 30 5 11 8 0 5 3 2

Color Bar Plot

>barplot(x$anxiety,col="blue", horiz=T, main="health data ", xlab="year")

Line Chart

Monthwise anxiety scores

>v=c(7,12,28,3,41)

>plot(v,type="o",col="red",xlab="Month", main="Anxiety scores over months")

Two line charts

t=c(14,7,6,19,3)

> lines(t,type="o",col="blue")

Steam-Leaf Plot (Confusion.. donot copy.. Rather test and show..)A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit).

Like in this example:

1 2345

2 2345678

3 12345678912345678

4 234

5 34

53 is divided into two 5 is stem and 3 is leaf

Like wise 54

When it is decimal, value after decimal is leaf. For example 2.5,2.6,2.7,1.5,1.9

1 5 9

2 5 6 7

Here is a data set.

Steam-Leaf Plot (Confusion.. donot copy.. Rather test and show..)

> x=c(10,12,22,25,34,22,23)

> stem(x)

The decimal point is 1 digit(s) to the right of the |

1 | 02

1 |

2 | 223

2 | 5

3 | 4

But this result is not meaningful

It should be

1 02

2 2235

3. 4

Boxplot

Boxplot is a simple way of representing statistical data on a plot in which a rectangle is drawn to represent the second and

third quartiles, usually with a vertical line inside to indicate the median value. The lower and upper quartiles are shown as

horizontal lines either side of the rectangle.

>boxplot(x, main="Depression level, High=less depressed", ylab="Scores", xlab="Fig.1. Few people are still depressed")

> boxplot(x, main="Depression level, High=less depressed", ylab="Scores", xlab="Fig.1. Few people are still depressed",

col="darkgreen")

Boxplot (Contd.)

I presented boxplot of single variable. Here is the distribution of more variables.

data are entered into x. So the command is :

> boxplot(x)

Since all variables are not in same scale the distribution is not clear. Therefore, always keep all the variables are on same

scale.

Here is the data.frame:

> str(x) # command for data structure

'data.frame': 9 obs. of 7 variables:

Boxplot (Contd.)

$ Year: int 1917 1927 1937 1947 1957 1967 1977 1987 1997

$ NSW : int 1904 2402 2693 2985 3625 4295 5002 5617 6274

$ Vic.: int 1409 1727 1853 2055 2656 3274 3837 4210 4605

$ Qld : int 683 873 993 110 141 170 213 267 340

$ SA : Factor w/ 8 levels "0 1","1 1","3 8",..: 4 6 7 8 3 1 1 5 2

$ WA : Factor w/ 9 levels "0 3","110","286",..: 1 7 9 5 8 2 3 4 6

$ Tas : int 6 92 57 502 688 87 12 14 17

>

Boxplot (Contd.)

In earlier class lectures, You have understood importance of box-whisker plot developed by John Tucky. I presented boxplot

of single and multiple variables. In case of multiple variables, I have told about similar scaling of all variables. In this lecture.

I will show you outlier. Outlier affects the central tendency. Outlier may happen for typing mistake or it may be considered

as real data. Outlier disturbs the correlation coefficients. Therefore it is important to examine existence of outlier.

Observation LocationIn case of Uni plot, three things are important.

A. location

B. dispersion

C. distribution

Here is one location of observation

A. Enumerative plots, in which all observations are shown, have the advantage of not losing any specific information–the values of the

individual observations can be retrieved from the plot. The disadvantage of such plots arises when there are a large number of observations–

it may be difficult to get an overall view of the properties of a variable. Enumerative plots do a fairly good job of displaying the location,

dispersion and distribution of a variable, but may not allow a clear comparison of variables, one to another.

Observation Location (Contd.)

Command

> plot(x$RPM)

B. > hist(x$RPM)

C. > stem(x$RPM)


3 | 3

3 | 57

4 | 11122223

4 | 555556667777888899999

5 | 000000122222333333334444

5 | 555555555555556666666666777777888888999

6 | 00

Data Frame

A data frame is used for creating table. Here is a data from different sources and will be merged in single data frame.

Sources

>a=c(2,3,4,5) # storing numeric.

>b=c("Delhi","Kolkata","Chennai","Mumbai") #storing non-numeric.

In above numeric data are stored in c and non-numeric are stored in b.

Command data.frame is used to form new data table. This is stored in d.

>d=data.frame(a,b)

Both a and b data are stored in d as array.

>d

New data table will be displayed.

>head(d) # a and b are hearers.

>d=data.frame(b,a)

> d

https://www.facebook.com/hashtag/storing?source=feed_text&story_id=10155733492127429

Data Frame (Contd.)

b a

1 Delhi 2

2 Kolkata 3

3 Chennai 4

4 Mumbai 5

> d[,1]

[1] Delhi Kolkata Chennai Mumbai

Levels: Chennai Delhi Kolkata Mumbai

Data Frame (Contd.)>

names(d)

EXTENDING DATA FRAME

[1] "a" "b"

> f=c("chocolate","cocacola","orange","apple") # new variable f is added

> d=data.frame(b,a,f)

> d

b a f

1 Delhi 2 chocolate

2 Kolkata 3 cocacola

3 Chennai 4 orange

4 Mumbai 5 apple

>pie(a,b)

Class Interval

>score=c(10,15,10,20,20,25,30,20,30,32,40,45,48,50)

> stem(score)


1 | 005

2 | 0005

3 | 002

4 | 058

5 | 0

> summary(score)

Min. 1st Qu. Median Mean 3rd Qu. Max.

10.00 20.00 27.50 28.21 38.00 50.00

Class Interval (contd.)

> table(score)

Score

10 15 20 25 30 32 40 45 48 50

2 1 3 1 2 1 1 1 1 1

> bins=seq(2,50,by=2)

> bins

[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Class Interval (contd.)>

> v1.cut=cut(v1,bins,right=F)

> v1freq=table(v1.cut)

> v1freq

v1.cut

[2,4) [4,6) [6,8) [8,10) [10,12) [12,14) [14,16) [16,18) [18,20) [20,22)

7 20 40 24 6 0 0 0 0 0

[22,24) [24,26) [26,28) [28,30) [30,32) [32,34) [34,36) [36,38) [38,40) [40,42)

0 0 0 0 0 0 0 0 0 0

[42,44) [44,46) [46,48) [48,50)

0 0 0 0

>

Naming and Name Calling

When we are born. Our parents give name. And we are getting one title indicating our heredity or root. Like wise we are giving names to

variables.

Example : Five people initialized with Deb.

>Deb=c("debi", "debjyoti", "debbani", "debasree")

One can not call Debi as d is small.

Another thing is that you have to call debi with title. So

>Deb$debi

>Deb$debjyoti

Array use (storing,retrieving,transforming, frequency table, proportion and histogram)

Arrays are the R data objects which can store data in more than two dimensions.

> x=c(1,2,3)

> y=x+2

> y

[1] 3 4 5

Read the data from file and store them

>x=read.csv(file.choose(),header=T)

> str(x)

'data.frame': 82 obs. of 34 variables:

$ SL.NO : int 1 2 3 4 5 6 7 8 9 10 ...

$ NAME : Factor w/ 82 levels " DILIP KUMAR ROY",..: 26 37 15 59 34 50 79 67 68 24 ...

$ area : Factor w/ 2 levels "BANSBARI","BHUYANPARA": 2 2 2 2 2 2 2 2 2 2 ...

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.

$ area_code: int 2 2 2 2 2 2 2 2 2 2 ...

$ J1 : int 3 3 1 3 4 5 3 5 4 4 ...

$ J2 : int 4 4 3 4 4 4 4 4 4 4 ...

$ J3 : int 4 4 0 4 4 4 3 5 4 4 ...

$ J4 : int 2 3 0 3 4 4 3 NA 3 4 ...

$ J5 : int 3 3 0 4 3 3 0 4 4 3 ...

$ J6 : int 4 4 2 4 4 5 4 4 4 3 ...

$ J7 : int 4 4 0 4 3 4 4 4 4 4 ...

$ J8 : int 4 4 0 4 0 0 4 0 0 0 ...


$ J9 : int 0 0 0 1 0 0 0 0 0 0 ...

$ J10 : int 5 5 NA 5 0 0 5 5 4 5 ...

$ J11 : int 2 NA 4 3 3 3 3 4 3 3 ...

$ J12 : int 3 1 NA 3 2 3 4 5 4 4 ...

$ J13 : int 3 2 2 3 3 3 0 4 4 3 ...

$ J14 : int 1 0 0 3 1 4 1 4 3 4 ...

$ J15 : int 3 2 0 3 3 3 2 3 3 3 ...

$ J16 : int 0 0 0 0 0 0 0 0 0 0 ...

$ J17 : int 3 0 0 3 0 0 3 0 0 0 ...

$ J18 : int 4 2 4 3 4 4 4 4 4 4 ...


$ J19 : int 4 3 0 3 0 0 1 4 0 0 ...

$ J20 : int 0 0 0 1 0 0 0 4 NA 0 ...

$ J21 : int 3 1 0 3 3 4 0 NA 2 3 ...

$ J22 : int 1 1 0 4 2 2 0 4 3 4 ...

$ J23 : int 2 3 0 2 4 4 0 5 4 4 ...

$ J24 : int 2 3 1 1 3 3 1 4 3 3 ...

$ J25 : int 4 3 0 3 0 0 0 3 3 3 ...

$ J26 : int 2 2 0 3 3 3 0 3 3 2 ...

$ J27 : int 5 4 0 4 3 0 4 4 3 4 ...

$ J28 : int 4 4 3 4 0 0 4 4 0 3 ...

$ J29 : int 4 3 0 4 4 4 4 5 4 4 ...

$ J30 : int 4 4 4 4 3 2 4 4 4 4 ...

Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.> y=table(x$J1)

> y

0 1 2 3 4 5

1 11 4 37 23 5

> (y/sum(y)*100)

0 1 2 3 4

1.234568 13.580247 4.938272 45.679012 28.395062

5

6.172840


>

> z=(y/sum(y)*100)

> z

0 1 2 3 4

1.234568 13.580247 4.938272 45.679012 28.395062

5

6.172840

> barplot(z)

Change the directory> dir () # locate directory

>hpl<-read.csv("RIASEC.csv",header=T,sep=",") #reading csv file

>hpl<-read.csv("RIASEC.csv")

>Str(hpl) # structure the object

>hpl<-read.table(file.choose(), header=T, sep=“,”) # Choosing file

>hpl

>hpl<-read.delim(file.choose(),header=T) # reading text file #Choosingdelimited text file

https://www.facebook.com/hashtag/reading?source=feed_text&story_id=10155675526557429

https://www.facebook.com/hashtag/choosing?source=feed_text&story_id=10155675526557429

Change the directory (Contd.)>names(hpl) #naming vectors

>hpl[1,1] # first row and first col

>hpl[, 4]

>hpl$Age

>mean(Age) #average

>sd(Age) # standard deviation

>sd(Age)/mean(Age)*100 # Coefficient of variation

>hist(Age) # histogram

>hist( Age, breaks=20) #20 bins

>plot(Age ~ y) # scatterplot

>stem(Age) # Stem leaf plot

https://www.facebook.com/hashtag/naming?source=feed_text&story_id=10155675526557429

https://www.facebook.com/hashtag/average?source=feed_text&story_id=10155675526557429

Health is physical, mental and social well-being not the absence of disease and infirmity.

Data Visualization

Data is plural, datum is singular.

Plural verb should be used in Data.

Data are not necessarily numeric, it can be text,

audio, picture. S

Health is directed continuum. It can be assessed with both Metric and Non-Metric Measurement scales and Uni,Bi and Multivariate Data visualization tools.

Data visualization

Data visualization is a general term that describes any effort to help people understand the significance of

data and to communicate by placing it in a visual context.

It is important in health psychological research to describe the health condition, it's determinants and

promotion.

Data provide information about psychological and behavioral processes in health, illness, and healthcare.

Besides, Visual data help understanding how psychological, behavioral, and cultural factors contribute to

physical health and illness.

History

Historically, data visualization has evolved through the work of noted practitioners. The founder of graphical methods in

statistics is William Playfair. William Playfair invented four types of graphs:

● the line graph,

● the bar chart of economic data ,

● the pie chart and t

● he circle graph.

History -2

Joseph Priestly had created the innovation of the first timeline charts, in which individual bars were used to visualize the life span of a person (1765).

That’s right timelines were invented 250 years.

Among the most famous early data visualizations is Napoleon’s March as depicted by Charles Minard. The data visualization packs in extensive

information on the effect of temperature on Napoleon’s invasion of Russia along with time scales. The graphic is notable for its representation in two

dimensions of six types of data: the number of Napoleon’s troops; distance; temperature; the latitude and longitude; direction of travel; and location

relative to specific dates

Florence Nightangle was also a pioneer in data visulaization. She drew coxcomb charts for depicting effect of disease on troop mortality (1858).

The use of maps in graphs or spatial analytics was pioneered by John Snow ( not from the Game of Thrones!). It was map of deaths from a cholera

outbreak in London, 1854, in relation to the locations of public water pumps and it helped pinpoint the outbreak to a single pump.

http://www.datavis.ca/gallery/re-minard.php

Process

In health psychological research and care, data visualization is important for acquiring, storing, retrieving and using of health care

information to foster better collaboration among various healthcare providers.

Disease pattern is changing in structure and process. Data visualization can gauge this change by clustering symptoms over periods.

Health infographics can be used for community service. Besides data visualization helps in health survelliance. By analysis of social

media post outbreak and geographical span of any depression can be easily identified. Accordingly survelliance system provider can

take measures to stop it.

plot (Year ~ NSW, data=x, pch=16)

Uni, Bi and Multivariate data visualization

tools are useful for understanding health status

and health related associations.

Univariate Statistics

Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other words data has only one variable.

A variable in univariate analysis is just a condition or subset that your data falls into. You can think of it as a “category.” For example, the

analysis might look at a variable of “age” or it might look at “height” or “weight”. However, it doesn’t look at more than one variable at a time

otherwise it becomes bivariate analysis (or in the case of 3 or more variables it would be called multivariate analysis).

MaterialsFound around the house!

2 drinking glasses

Table salt

2 eggs

Water

Procedure

Lorem ipsum dolor sit amet, consectetur

adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna

aliqua

Incididunt ut labore et dolore

Consectetur adipiscing elit, sed do

eiusmod tempor incididunt ut labore

et dolore magna aliqua

Incididunt ut labore et dolore

Hypothesis

Tell the audience what you expect to happen...

I think this is what’s going to happen

because…


adipiscing elit, sed do eiusmod tempor

incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud

exercitation ullamco laboris nisi ut aliquip.

Variables that may affect the

outcome...


adipiscing elit

Sed do eiusmod tempor incididunt ut

labore et dolore magna aliqua

Hypothesis support

The experiment

Conclusion

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad

minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.

Data visualization in Health related research

Education

Transcript of Data visualization in Health related research