Data visualization in Health related research
-
Upload
d-dutta-roy -
Category
Education
-
view
93 -
download
1
Transcript of Data visualization in Health related research
Data visualization In Health related Researches
Debdulal Dutta Roy Indian statistical Institute, Kolkata
●The Policy seeks to reach everyone in a comprehensive integrated way
to move towards wellness. It aims at achieving universal health coverage
and delivering quality health care services to all at affordable cost.
●The policy is patient centric and quality driven. It addresses health
security and make in India for drugs and devices.
●The main objective of the National Health Policy 2017 is to achieve the
highest possible level of good health and well-being, through a
preventive and promotive health care orientation in all developmental
policies, and to achieve universal access to good quality health care
services without anyone having to face financial hardship as a
consequence.
Why do you learn Health Data Visualization ?
16.3.2017
Why do you learn Health Data Visualization ?
●Quality control chart
2. Quality diversity or stratification
3. Quality correlates
4. Quality monitoring
5. Quality prediction
6. Quality Assurance
Why do you learn Health Data Visualization ?
Efficiency: decreasing costs by avoiding duplicative or
unnecessary diagnostic or therapeutic interventions.
Enhancing Quality:Enhancing the quality of health care
by involving consumers as additional power for quality
assurance, and directing patient streams to the best
quality providers. Evidence based: e-Health interventions
should be evidence based in a sense that their
effectiveness and efficiency should not be assumed but
proven by rigorous evaluation. Empowerment of
consumers and patients: By making the knowledge bases
of medicine and personal electronic records accessible to
consumers over the internet, e-Health opens new avenues
for patient centered medicine and enables evidence based
patient choice.
Health Data
Visualization
Access
Quality
Affordability
Lowering of
Disease
burden
Efficiency
monitoring
Health
surveillance
Why do you learn Health Data Visualization ?
Encouragement: A new relationship between the patient and health professional, towards a true partnership, where decisions are made in a
shared manner is developed.
Education: The physicians are educated through online resources like medical education and consumers like health education, preventive
information etc.
Extending: The scope of health care is extended beyond its conventional boundaries. It means both in geographical and conceptual sense,
e-Health enables consumers to easily obtain health services online from global providers.
Ethics: e-Health involves new forms of patient-physician interaction and poses new challenges and threats to ethical issues such as online
professional practice, informed consent, privacy and equity issues.
Equity: People who do not have money, skills, and access to computers and networks cannot use computers effectively. As a result, these
patient populations are those who are least likely to benefit from advances in information technology, unless political measures ensure
equitable access for all.
Why is E-health care important in India ?
India is a vast country with complex socio-economic characteristics that are reflected in its medical systems. These include an insufficient
number of primary care doctors practising in rural and semi-urban areas and an ongoing need to update the knowledge of those who do work
in rural areas.
Qualified doctors’ practice can diverge widely from standards of care, with many medical practitioners lacking formal qualifications altogether.
Out-of-pocket expenditure, which constitutes around 80% of the total healthcare spending in India may be further inflated by travel costs in
both urban and rural areas.
Thus, many conditions remain untreated or are managed with prescription medicines purchased over-the-counter or by faith healers..
Consequently, the 70% of the population that lives in rural areas, in particular, has limited access to adequate health care.
Finally, epidemiological data is often unavailable or non-reliable, hampering the informed design of preventive health programmes.
Data Visualization in health care by Artificial Intelligence
Artificial intelligence (AI) in healthcare uses algorithms and software to approximate human cognition in
the analysis of complex medical data.
The primary aim of health-related AI applications is to analyze relationships between prevention or
treatment techniques and patient outcomes.
AI programs have been developed and applied to practices such as diagnosis processes, treatment
protocol development, drug development, personalized medicine and patient monitoring and care, among
others.
Purpose
Ronny Reader
Abby AuthorWendy Writer
Purpose
Health Infographics for Community Awareness
Communicating statistics for Academic
Purpose
Descriptive Statistics
>mean(x), median(x), mode(x)
>sort(x) # Sorting from Minimum to Maximum
>summary(x)
>Install package
#install.packages("psych")
library(psych)
describe(x)
>sd(x)
>table(x)
>prop.table(x) # Proportion
>ftable(x,y,) # frequency of multiple vectors
Compare two groups
> t.test(x$Safe_school,x$Delinquency)
>oneway.test(y~y2) # y is metric and y2 is categorical
oneway.test(x$Safe_school~x$Delinquency)
One-way analysis of means (not assuming equal
variances)
data: x$Safe_school and x$Delinquency
F = 73.84, num df = 8.000, denom df = 81.964,
p-value < 2.2e-16
Measurement Scales
Nominal
Ordinal
Interval
Ratio
Data visualization Bar plot With SPSS480 archive data were retrieved. Each datum represents
specific psychiatric disorders.
Results show more number of patients suffered from
Depression. And With age there was change in other
psychiatric disorders.
USE SPSS Bar plot for variables> Use mean, Row+sex,
Col=Age
Depression by age age and sex
Stacked Bar plot
Correspondence Analysis using R
Step 1: Download CA
package
Select the Tool option
and enter ca in the
space provided.Click on
the Install button. The
package will be
installed
Correspondence Analysis using R (contd.)
Step 2: Recalling library to use CA package
> library(ca)
Step 3: Reading the file for analysis (Here
Disorder_Data is the name of the file)
>Disorder_Data<-
read.table(file.choose(),header=T,sep =
",",row.names=1)
You have to choose the path and
select the data and then R will
read the file after that give
command
<view(Disaster_Data)
the data will be displayed.
Correspondence Analysis using R (contd.)
Since the data selected for analysis has missing
values those need to be eliminated before doing CA
otherwise it will not give any meaningful output
Step 4: Create a new Excel File in the CSV format with
the variables having frequencies in the cell.
After a new file is formed. Repeat the step 3.
Reading the file for analysis (Here DD is the name of
the file)
Correspondence Analysis using R (contd.)
Step 5: For CA write the following
command and the generate the plot
> ca_analysis<-ca(DD)
> plot(ca_analysis,col=1,cex=.6,xlim=c(-
.225,.375),ylim=c(-4,.4))
> plot(ca_analysis,col=1,cex=.01,xlim=c(-
.225,.375),ylim=c(-.4,.4))
Correspondence Analysis using R (contd.)
Step 6: To see the summary enter the following
command
> summary(ca_analysis)
Data visualization In Ordinal Data
Steps for representing ordinal data graphically in excel
1. Enter the data in excel.
2. Rank the data using the rank command.
e.g. =rank(B1,$B1$1:$B1$12)
1. Calculate the median of the ranks
2. Assign negative sign to ranks below the median.
3. Custom sort the ranks from smallest to largest after selecting the data.
4. Select the ‘ranks’ column and go to ‘insert’ tab and then select ‘clustered bar’ from ‘bar
chart’.
Multiple graphs in single page
Nobody will teach you Data Analytics in Free.
Its cost ranged from 30k to 1 lakh or more
Data Visualization with R (4 graphs, 2 X 2)
>par(mfrow=c(2,2))
>boxplot(x$Safe_school, main="Box-Plot of safe school")
>boxplot(x$Delinquency, main="Delinquency")
> hist(x$Safe_school, main="Histogram of safe school")
> hist(x$Delinquency, main="Histogram of Delinquency")
Data visualization In Regression
Data Visualization class with R (Regression)
>cor(x,use="complete.obs", method="pearson" )
>plot(x$Safe_school~x$Delinquency,xlab="Less_delinquency",
ylab="Safe_school_perception", main="Relation of Safe school
perception and Delinquency")
>lm(formula = x$Safe_school ~ x$Delinquency)
>abline(26.79,3.20)
Health diversity
Health diversity is critical factor for developing health literacy.
Besides biological reasons, Age, Gender, Caste, Religion,
Socio-economic status cause diversity in health behaviour.
Health infographics and Data Visualization in Health Communication
Infographics are visual representations of facts, events, and numbers, and can be used to depict health statistics, risk
assessments, and resources to name a few. The application of visual pattern, illustration, and iconography enhance the way
information is cognitively consumed. Therefore, infographics demand creativity such that they are designed with appeal and
comprehension targeted to a particular audience, are context-specific, and center on relevant themes from a particular
dataset.
1. Easy to remember: visual impact can play an important role in how memorable the information is to the reader.
Ineffective presentation of data can turn off target audiences, resulting in missed messages.
2. Data sets are often difficult for the general public to find. With the click of a mouse, health concepts and statistics can
be shared virally, from any site to the next, on personal pages, or in journalistic formats; where raw data is much
more difficult to interpret, communicate, and embed. Infographics are designed around themes and concepts that
concisely tell a story, eliminating the need for sharers to reinvent.
3. Data visualization differs slightly from infographics: data visualization is all about the numbers, abstracting data sets
into schematics that clarify statistics in the form of graphs, maps or charts. These are usually constructed
scientifically and automatically by software. Where infographics are designed to communicate a story, data
visualization is less holistic and more quantifiable.
Mental Health Infographics with Pie Chart
Mental Health Infographics with Histogram
Mental Health Infographics with Correspondence map
Chi-square
> tb1=table(x$AGE.CODE,x$CASTE)
> tb1
1 2 3
1 219 19 2
2 219 21 0
> chisq.test(tb1)
Pearson's Chi-squared test
data: tb1
X-squared = 2.1, df = 2, p-value = 0.3499
Warning message:
In chisq.test(tb1) : Chi-squared approximation may be incorrect
T-test in r
# independent 2-group t-test
t.test(y~x) # where y is numeric and x is a binary factor
# independent 2-group t-test
t.test(y1,y2) # where y1 and y2 are numeric
# paired t-test
t.test(y1,y2,paired=TRUE) # where y1 & y2 are numeric
# one sample t-test
t.test(y,mu=3) # Ho: mu=3
Non-Parametric Test
# independent 2-group Mann-Whitney U Test
wilcox.test(y~A)
# where y is numeric and A is A binary factor
# independent 2-group Mann-Whitney U Test
wilcox.test(y,x) # where y and x are numeric
# dependent 2-group Wilcoxon Signed Rank Test
wilcox.test(y1,y2,paired=TRUE) # where y1 and y2 are numeric
# Kruskal Wallis Test One Way Anova by Ranks
kruskal.test(y~A) # where y1 is numeric and A is a factor
# Randomized Block Design - Friedman Test
friedman.test(y~A|B)
Why R?In my data visualisation classes, I often used graphics through SPSS. SPSS is menu driven. Here as researcher, I find little control on the program.
Second graphic output of SPSS is not appealing in the sense it's line thickness.
> x <- c(1,2,3,4,5)
> x
[1] 1 2 3 4 5
> hist x
Here I created one vector or variable named x (small letter) and see the histogram. Sometimes I am confused as which is right ? SPSS or R for same
data. For example, for the same vector SPSS and R provides different graphs.
R programming can be written with more than 7000 libraries. It is regularly updated by data scientists all around the world. And all facilities are open
access. Finally, there are more number of graphics in R. Furthermore R is interactive as if you are talking with machine. Here is one example.
> x=5+4
> x
[1] 9
>
I read Garrett, Guilford. I learnt many formula but when I use SPSS, I feel awkward as it has no utility here as I am acting as machine. I have no
freedom to control. But in R, I can control my analysis.
How can I install R?
There are two versions. I prefer R studio. Following are the steps:
Go to https://www.rstudio.com/products/rstudio/download/ In ‘Installers for Supported Platforms’ section, choose and click
the R Studio installer based on your operating system.
The download should begin as soon as you click.Click Next..Next..Finish.Download Complete.
To Start R Studio, click on its desktop icon or use ‘search windows’ to access the program. It looks like this:
Why R Studio?
In data visualisation, it is important to manipulate attributes of the graph. This interactive change is visible through R studio.
RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and
graphics. RStudio was founded by JJ Allaire, creator of the programming language ColdFusion. Hadley Wickham is the Chief Scientist at
RStudio.
Researchers change attributes in R console and it's results are displayed in graphical output window. Old script is visible in script window
and specific r environment is visible in R environment window. R environment window helps in understanding stored variables. For example :
>Iq<- 100
>Iq1<- 120
>ls ( )
>"Iq" "Iq1"
One can remove the variable by
>rm(Iq)
How can I transfer file?
I have noticed that r is more friendly to csv file. CSV stands for "comma-separated values". CSV is a simple file format used
to store tabular data, such as a spreadsheet or database. Files in the CSV format can be imported to and exported from
programs that store data in tables, such as Microsoft Excel.
So, my suggestion is to convert your spreadsheet with header into csv file by using save as <file name>.csv
How can I transfer file? (contd.)Input file
a,b,c
10,20,30
10,20,30
In console, type
my.data = read.table(file.choose(), header=TRUE)
Here file choose command asks you to show the file source.
> my.data
a.b.c
1 10,20,30
2 10,20,30
If you know the source, then write
my.data<-read.csv("C:/Users/DDROY/Desktop/test.csv")
You can remove data file by
>rm(my.data)
Is interactive mode possible?Interactive refers to two-way flow of information between a computer and a computer-user; responding to a user’s input.
R programming is interactive. Here researchers can communicate withcomputer as if friend. Only thing you have to learn it's own
language.
It understands statement as command when it finds the sign '>'.
For example
>x=10/5
>x
5
Or
>x=c("Kolkata","Delhi")
>x
Kolkata Delhi
So same variable can hold both numeric and alpha numeric.
Scatter Plot
Scatter plot (also called a scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical
diagram using Cartesian coordinates to display values for typically two variables for a set of data.
Scatter Plot is useful to understand relation of two variables. Usually dataset of explanatory variables are on X-axis and it's
changes on dependent variable are on Y-axis.
Scatter Plot (contd.)
R is very good for statistics. In the input file, write variable name as header. And in read command, write header = T or TRUE.
> hp<-read.csv("aust.csv",header=T,sep=",")
> hp
Year NSW Vic. Qld SA WA Tas
1 1917 1904 1409 683 44 0 3 6
2 1927 2402 1727 873 56 5 3 92
3 1937 2693 1853 993 58 9 4 57
4 1947 2985 2055 110 6 6 46 502
5 1957 3625 2656 141 3 8 73 688
6 1967 4295 3274 170 0 1 110 87
7 1977 5002 3837 213 0 1 286 12
8 1987 5617 4210 267 5 1 393 14
9 1997 6274 4605 340 1 1 480 17
> plot(NSW~Year, data=hp, pch=16)
Steps for Simple Regression in RStep 1: Correlation
cor(x,use="complete.obs", method="pearson" )
Step 2: Plot
> plot(x$Safe_school~x$Delinquency)
Step 3: Run Regression between Safe school and
Delinquency
> lm(formula = x$Safe_school ~ x$Delinquency)
This will generate Intercept and Coefficients
Step 4: After getting Intercept and Coefficients use the
following command to generate the plot
> abline(Intercept,Coefficient)
Scatter Plot using SPSS
Step 1: Graphs-----> Legacy Dialogs----> Scatter/dot-----> Simple
Step 2: Select the Safe_total variable and move it to Y-axis and
School attendance motivation to X-axis
Step 3: Select Caste Code and move it to Panel (Row box)
Step 4: Click Ok
Syntax
GRAPH
/SCATTERPLOT(BIVAR)=SAM_total WITH safe_total
/PANEL ROWVAR=Caste_code ROWOP=NEST
/MISSING=LISTWISE.
HistogramA histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous univariate
data. This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.
Initially number of scores in each class interval or bin is counted. Later on plot is prepared. Plot shows binwise frequency of scores.
Histogram plot can be drawn in R with following arguments.
hist(x$v1, main="Anxiety", xlab=" Level", ylab="Number of cases", border ="blue")
Here
x$v1 = data of v1.
main=Graph name
xlab=name of x axis
Border =blue colour of histogram
>hist(x$V1)
>hist(x$V1, main="my histogram")
>hist(x$V1, main="my histogram", xlab="Anxiety")
>hist(x$V1, main="my histogram", xlab="Anxiety", ylab="frequency ")
Random Number
We have read the normal distribution where in mean=0. We have read it but we have not seen it. Today, I will show you that
data with Mean=0, And SD=1. But keep in mind ideal condition is Mean=0. This is ideal and it can not be found. Therefore
distribution will be close to 0.
This is the example:
> x=rnorm(10,0,1) # 10 is number of data, 0 is Mean, 1 is SD.
> x
[1] -0.8161694 1.4684068 0.2120832 -2.2949087 -0.4389617 1.3511973
[7] -0.9904338 -1.8606424 0.3877817 0.5777760
> mean(x)
[1] -0.2403871
Random Number
> sd(x)
[1] 1.269368
> x=rnorm(10,0,1)
> x
[1] -1.4317261 0.6816895 -2.8454675 -0.6584917 1.1255525 -1.7652789
[7] -0.7825077 -0.2994345 0.2827423 -0.8911193
> mean(x)
[1] -0.6584041
> sd(x)
[1] 1.18645
>
Reading file from any directory
R wants to read data from directory. Therefore, we initially change the directory or use separate arguments. Here is another
argument in which researcher will show the source directory.
>my.data = read.table(file.choose(), header=TRUE)
check the data by this command
>str(my.data)
Bar Plot and Histogram
Histograms are used to show distributions of variables while bar charts are used to compare variables. Histograms plot
quantitative data with ranges of the data grouped into bins or intervals while bar charts plot categorical data
x<-read.csv(file.choose(),header=TRUE)
barplot(x$Anxiety)
> x$Anxiety
[1] 42 32 19 42 30 5 11 8 0 5 3 2
Color Bar Plot
>barplot(x$anxiety,col="blue", horiz=T, main="health data ", xlab="year")
Line Chart
Monthwise anxiety scores
>v=c(7,12,28,3,41)
>plot(v,type="o",col="red",xlab="Month", main="Anxiety scores over months")
Two line charts
t=c(14,7,6,19,3)
> lines(t,type="o",col="blue")
Steam-Leaf Plot (Confusion.. donot copy.. Rather test and show..)A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first digit or digits) and a "leaf" (usually the last digit).
Like in this example:
1 2345
2 2345678
3 12345678912345678
4 234
5 34
53 is divided into two 5 is stem and 3 is leaf
Like wise 54
When it is decimal, value after decimal is leaf. For example 2.5,2.6,2.7,1.5,1.9
1 5 9
2 5 6 7
Here is a data set.
Steam-Leaf Plot (Confusion.. donot copy.. Rather test and show..)
> x=c(10,12,22,25,34,22,23)
> stem(x)
The decimal point is 1 digit(s) to the right of the |
1 | 02
1 |
2 | 223
2 | 5
3 | 4
But this result is not meaningful
It should be
1 02
2 2235
3. 4
Boxplot
Boxplot is a simple way of representing statistical data on a plot in which a rectangle is drawn to represent the second and
third quartiles, usually with a vertical line inside to indicate the median value. The lower and upper quartiles are shown as
horizontal lines either side of the rectangle.
>boxplot(x, main="Depression level, High=less depressed", ylab="Scores", xlab="Fig.1. Few people are still depressed")
> boxplot(x, main="Depression level, High=less depressed", ylab="Scores", xlab="Fig.1. Few people are still depressed",
col="darkgreen")
Boxplot (Contd.)
I presented boxplot of single variable. Here is the distribution of more variables.
data are entered into x. So the command is :
> boxplot(x)
Since all variables are not in same scale the distribution is not clear. Therefore, always keep all the variables are on same
scale.
Here is the data.frame:
> str(x) # command for data structure
'data.frame': 9 obs. of 7 variables:
Boxplot (Contd.)
$ Year: int 1917 1927 1937 1947 1957 1967 1977 1987 1997
$ NSW : int 1904 2402 2693 2985 3625 4295 5002 5617 6274
$ Vic.: int 1409 1727 1853 2055 2656 3274 3837 4210 4605
$ Qld : int 683 873 993 110 141 170 213 267 340
$ SA : Factor w/ 8 levels "0 1","1 1","3 8",..: 4 6 7 8 3 1 1 5 2
$ WA : Factor w/ 9 levels "0 3","110","286",..: 1 7 9 5 8 2 3 4 6
$ Tas : int 6 92 57 502 688 87 12 14 17
>
Boxplot (Contd.)
In earlier class lectures, You have understood importance of box-whisker plot developed by John Tucky. I presented boxplot
of single and multiple variables. In case of multiple variables, I have told about similar scaling of all variables. In this lecture.
I will show you outlier. Outlier affects the central tendency. Outlier may happen for typing mistake or it may be considered
as real data. Outlier disturbs the correlation coefficients. Therefore it is important to examine existence of outlier.
Observation LocationIn case of Uni plot, three things are important.
A. location
B. dispersion
C. distribution
Here is one location of observation
A. Enumerative plots, in which all observations are shown, have the advantage of not losing any specific information–the values of the
individual observations can be retrieved from the plot. The disadvantage of such plots arises when there are a large number of observations–
it may be difficult to get an overall view of the properties of a variable. Enumerative plots do a fairly good job of displaying the location,
dispersion and distribution of a variable, but may not allow a clear comparison of variables, one to another.
Observation Location (Contd.)
Command
> plot(x$RPM)
B. > hist(x$RPM)
C. > stem(x$RPM)
The decimal point is 1 digit(s) to the right of the |
3 | 3
3 | 57
4 | 11122223
4 | 555556667777888899999
5 | 000000122222333333334444
5 | 555555555555556666666666777777888888999
6 | 00
Data Frame
A data frame is used for creating table. Here is a data from different sources and will be merged in single data frame.
Sources
>a=c(2,3,4,5) # storing numeric.
>b=c("Delhi","Kolkata","Chennai","Mumbai") #storing non-numeric.
In above numeric data are stored in c and non-numeric are stored in b.
Command data.frame is used to form new data table. This is stored in d.
>d=data.frame(a,b)
Both a and b data are stored in d as array.
>d
New data table will be displayed.
>head(d) # a and b are hearers.
>d=data.frame(b,a)
> d
Data Frame (Contd.)
b a
1 Delhi 2
2 Kolkata 3
3 Chennai 4
4 Mumbai 5
> d[,1]
[1] Delhi Kolkata Chennai Mumbai
Levels: Chennai Delhi Kolkata Mumbai
Data Frame (Contd.)>
names(d)
EXTENDING DATA FRAME
[1] "a" "b"
> f=c("chocolate","cocacola","orange","apple") # new variable f is added
> d=data.frame(b,a,f)
> d
b a f
1 Delhi 2 chocolate
2 Kolkata 3 cocacola
3 Chennai 4 orange
4 Mumbai 5 apple
>pie(a,b)
Class Interval
>score=c(10,15,10,20,20,25,30,20,30,32,40,45,48,50)
> stem(score)
The decimal point is 1 digit(s) to the right of the |
1 | 005
2 | 0005
3 | 002
4 | 058
5 | 0
> summary(score)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 20.00 27.50 28.21 38.00 50.00
Class Interval (contd.)
> table(score)
Score
10 15 20 25 30 32 40 45 48 50
2 1 3 1 2 1 1 1 1 1
> bins=seq(2,50,by=2)
> bins
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Class Interval (contd.)>
> v1.cut=cut(v1,bins,right=F)
> v1freq=table(v1.cut)
> v1freq
v1.cut
[2,4) [4,6) [6,8) [8,10) [10,12) [12,14) [14,16) [16,18) [18,20) [20,22)
7 20 40 24 6 0 0 0 0 0
[22,24) [24,26) [26,28) [28,30) [30,32) [32,34) [34,36) [36,38) [38,40) [40,42)
0 0 0 0 0 0 0 0 0 0
[42,44) [44,46) [46,48) [48,50)
0 0 0 0
>
Naming and Name Calling
When we are born. Our parents give name. And we are getting one title indicating our heredity or root. Like wise we are giving names to
variables.
Example : Five people initialized with Deb.
>Deb=c("debi", "debjyoti", "debbani", "debasree")
One can not call Debi as d is small.
Another thing is that you have to call debi with title. So
>Deb$debi
>Deb$debjyoti
Array use (storing,retrieving,transforming, frequency table, proportion and histogram)
Arrays are the R data objects which can store data in more than two dimensions.
> x=c(1,2,3)
> y=x+2
> y
[1] 3 4 5
Read the data from file and store them
>x=read.csv(file.choose(),header=T)
> str(x)
'data.frame': 82 obs. of 34 variables:
$ SL.NO : int 1 2 3 4 5 6 7 8 9 10 ...
$ NAME : Factor w/ 82 levels " DILIP KUMAR ROY",..: 26 37 15 59 34 50 79 67 68 24 ...
$ area : Factor w/ 2 levels "BANSBARI","BHUYANPARA": 2 2 2 2 2 2 2 2 2 2 ...
Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.
$ area_code: int 2 2 2 2 2 2 2 2 2 2 ...
$ J1 : int 3 3 1 3 4 5 3 5 4 4 ...
$ J2 : int 4 4 3 4 4 4 4 4 4 4 ...
$ J3 : int 4 4 0 4 4 4 3 5 4 4 ...
$ J4 : int 2 3 0 3 4 4 3 NA 3 4 ...
$ J5 : int 3 3 0 4 3 3 0 4 4 3 ...
$ J6 : int 4 4 2 4 4 5 4 4 4 3 ...
$ J7 : int 4 4 0 4 3 4 4 4 4 4 ...
$ J8 : int 4 4 0 4 0 0 4 0 0 0 ...
Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.
$ J9 : int 0 0 0 1 0 0 0 0 0 0 ...
$ J10 : int 5 5 NA 5 0 0 5 5 4 5 ...
$ J11 : int 2 NA 4 3 3 3 3 4 3 3 ...
$ J12 : int 3 1 NA 3 2 3 4 5 4 4 ...
$ J13 : int 3 2 2 3 3 3 0 4 4 3 ...
$ J14 : int 1 0 0 3 1 4 1 4 3 4 ...
$ J15 : int 3 2 0 3 3 3 2 3 3 3 ...
$ J16 : int 0 0 0 0 0 0 0 0 0 0 ...
$ J17 : int 3 0 0 3 0 0 3 0 0 0 ...
$ J18 : int 4 2 4 3 4 4 4 4 4 4 ...
Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.
$ J19 : int 4 3 0 3 0 0 1 4 0 0 ...
$ J20 : int 0 0 0 1 0 0 0 4 NA 0 ...
$ J21 : int 3 1 0 3 3 4 0 NA 2 3 ...
$ J22 : int 1 1 0 4 2 2 0 4 3 4 ...
$ J23 : int 2 3 0 2 4 4 0 5 4 4 ...
$ J24 : int 2 3 1 1 3 3 1 4 3 3 ...
$ J25 : int 4 3 0 3 0 0 0 3 3 3 ...
$ J26 : int 2 2 0 3 3 3 0 3 3 2 ...
$ J27 : int 5 4 0 4 3 0 4 4 3 4 ...
$ J28 : int 4 4 3 4 0 0 4 4 0 3 ...
$ J29 : int 4 3 0 4 4 4 4 5 4 4 ...
$ J30 : int 4 4 4 4 3 2 4 4 4 4 ...
Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.> y=table(x$J1)
> y
0 1 2 3 4 5
1 11 4 37 23 5
> (y/sum(y)*100)
0 1 2 3 4
1.234568 13.580247 4.938272 45.679012 28.395062
5
6.172840
Array use (storing,retrieving,transforming, frequency table, proportion and histogram) Contd.
>
> z=(y/sum(y)*100)
> z
0 1 2 3 4
1.234568 13.580247 4.938272 45.679012 28.395062
5
6.172840
> barplot(z)
Change the directory> dir () # locate directory
>hpl<-read.csv("RIASEC.csv",header=T,sep=",") #reading csv file
>hpl<-read.csv("RIASEC.csv")
>Str(hpl) # structure the object
>hpl<-read.table(file.choose(), header=T, sep=“,”) # Choosing file
>hpl
>hpl<-read.delim(file.choose(),header=T) # reading text file #Choosingdelimited text file
Change the directory (Contd.)>names(hpl) #naming vectors
>hpl[1,1] # first row and first col
>hpl[, 4]
>hpl$Age
>mean(Age) #average
>sd(Age) # standard deviation
>sd(Age)/mean(Age)*100 # Coefficient of variation
>hist(Age) # histogram
>hist( Age, breaks=20) #20 bins
>plot(Age ~ y) # scatterplot
>stem(Age) # Stem leaf plot
Health is physical, mental and social well-being not the absence of disease and infirmity.
Data Visualization
Data is plural, datum is singular.
Plural verb should be used in Data.
Data are not necessarily numeric, it can be text,
audio, picture. S
Health is directed continuum. It can be assessed with both Metric and Non-Metric Measurement scales and Uni,Bi and Multivariate Data visualization tools.
Data visualization
Data visualization is a general term that describes any effort to help people understand the significance of
data and to communicate by placing it in a visual context.
It is important in health psychological research to describe the health condition, it's determinants and
promotion.
Data provide information about psychological and behavioral processes in health, illness, and healthcare.
Besides, Visual data help understanding how psychological, behavioral, and cultural factors contribute to
physical health and illness.
History
Historically, data visualization has evolved through the work of noted practitioners. The founder of graphical methods in
statistics is William Playfair. William Playfair invented four types of graphs:
● the line graph,
● the bar chart of economic data ,
● the pie chart and t
● he circle graph.
History -2
Joseph Priestly had created the innovation of the first timeline charts, in which individual bars were used to visualize the life span of a person (1765).
That’s right timelines were invented 250 years.
Among the most famous early data visualizations is Napoleon’s March as depicted by Charles Minard. The data visualization packs in extensive
information on the effect of temperature on Napoleon’s invasion of Russia along with time scales. The graphic is notable for its representation in two
dimensions of six types of data: the number of Napoleon’s troops; distance; temperature; the latitude and longitude; direction of travel; and location
relative to specific dates
Florence Nightangle was also a pioneer in data visulaization. She drew coxcomb charts for depicting effect of disease on troop mortality (1858).
The use of maps in graphs or spatial analytics was pioneered by John Snow ( not from the Game of Thrones!). It was map of deaths from a cholera
outbreak in London, 1854, in relation to the locations of public water pumps and it helped pinpoint the outbreak to a single pump.
Process
In health psychological research and care, data visualization is important for acquiring, storing, retrieving and using of health care
information to foster better collaboration among various healthcare providers.
Disease pattern is changing in structure and process. Data visualization can gauge this change by clustering symptoms over periods.
Health infographics can be used for community service. Besides data visualization helps in health survelliance. By analysis of social
media post outbreak and geographical span of any depression can be easily identified. Accordingly survelliance system provider can
take measures to stop it.
plot (Year ~ NSW, data=x, pch=16)
Uni, Bi and Multivariate data visualization
tools are useful for understanding health status
and health related associations.
Univariate Statistics
Univariate analysis is the simplest form of analyzing data. “Uni” means “one”, so in other words data has only one variable.
A variable in univariate analysis is just a condition or subset that your data falls into. You can think of it as a “category.” For example, the
analysis might look at a variable of “age” or it might look at “height” or “weight”. However, it doesn’t look at more than one variable at a time
otherwise it becomes bivariate analysis (or in the case of 3 or more variables it would be called multivariate analysis).
MaterialsFound around the house!
2 drinking glasses
Table salt
2 eggs
Water
Procedure
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna
aliqua
Incididunt ut labore et dolore
Consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore
et dolore magna aliqua
Incididunt ut labore et dolore
Hypothesis
Tell the audience what you expect to happen...
I think this is what’s going to happen
because…
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip.
Variables that may affect the
outcome...
Lorem ipsum dolor sit amet, consectetur
adipiscing elit
Sed do eiusmod tempor incididunt ut
labore et dolore magna aliqua
Hypothesis support
The experiment
Conclusion
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip.