Analysing Social Science Data Using RClass 6: Frequency tables, matrices, arrays, and functions


Frequency tables Chi-square tests Matrices and arrays Creating functions Functions for tables

1 Frequency tables

2 Chi-square tests

3 Matrices and arrays

4 Creating functions

5 Functions for tables

Creating frequency tables

Frequency tables are created with function table.table(pgss$kobieta) # univariate tableswith(pgss, table(kobieta, g5b)) # bivariate tableswith(pgss, table(kobieta, g5b, pgssyear)) # multidimensional tables

Tables with many dimensions

with(pgss, table(kobieta, g5b, pgssyear)) # multidimensional tables## , , pgssyear = 1999#### g5b## kobieta -9 -2 1 2 3 4 5 6 7## FALSE 0 984 0 0 0 0 0 0 0## TRUE 0 1298 0 0 0 0 0 0 0#### , , pgssyear = 2008#### g5b## kobieta -9 -2 1 2 3 4 5 6 7## FALSE 0 0 28 109 84 52 166 161 23## TRUE 1 0 13 117 118 57 179 152 33

Additional arguments in table

exclude vector of levels to exclude from the table. ExcludesNAs by default.

useNA whether to show rows/columns corresponding to NA,one of "no" (default) or "ifany" or "always".

Chi-square testWith χ2 tests we test a hypothesis that variables comprising thetable are stochastically independent.

Variables are stochastically independent if variables are uselessin predicting one another.

1 2 3 Total

a .2 .5 .3 1b .2 .5 .3 1

1 Given a table of observed frequencies m, we can calculate howthe table would look like if the variables were stochasticallyindependent (say m̂).

2 Calculate χ2 = (m−m̂)2

m̂ .3 Compare to theoretical distribution to get a p-value.

Performing Chi-square tests

Use chisq.test function with table as an argument:g5b <- replace(pgss08$g5b, which(pgss08$g5b==-9), NA)(tab <- table(pgss08$kobieta, g5b))## g5b## 1 2 3 4 5 6 7## FALSE 28 109 84 52 166 161 23## TRUE 13 117 118 57 179 152 33chisq.test(tab)#### Pearson's Chi-squared test#### data: tab## X-squared = 12.64, df = 6, p-value = 0.0492

Matrices and arrays

New type of objectsmatrices rectangular (two-dimensional) objects containing

elements of the same type(numeric/character/logical).

arrays "multidimensional matrices".Tables created with table are special types of matrices/arrays.

Creating from scratch: matrices

Matrices and arrays cane be created from scratch with matrix andarray:matrix(1:6, nrow=2, ncol=2)## [,1] [,2]## [1,] 1 3## [2,] 2 4

Creating from scratch: arrays

array(1:8, dim=c(2,2,2))## , , 1#### [,1] [,2]## [1,] 1 3## [2,] 2 4#### , , 2#### [,1] [,2]## [1,] 5 7## [2,] 6 8

Row and column names: rownames, colnames

tab## g5b## 1 2 3 4 5 6 7## FALSE 28 109 84 52 166 161 23## TRUE 13 117 118 57 179 152 33rownames(tab)## [1] "FALSE" "TRUE"colnames(tab)## [1] "1" "2" "3" "4" "5" "6" "7"rownames(tab) <- c("mężczyzna", "kobieta")tab## g5b## 1 2 3 4 5 6 7## mężczyzna 28 109 84 52 166 161 23## kobieta 13 117 118 57 179 152 33

Indexing matrices and arrays

Using [ ], like vectors, but more subscriptstab## g5b## 1 2 3 4 5 6 7## mężczyzna 28 109 84 52 166 161 23## kobieta 13 117 118 57 179 152 33tab[1,1] # single element## [1] 28tab[1, ] # first row## 1 2 3 4 5 6 7## 28 109 84 52 166 161 23tab[ ,2] # second column## mężczyzna kobieta## 109 117

Negative subscripts

Negative subscripts can be used to drop associatedelements/rows/[ -1 , ]## 1 2 3 4 5 6 7## 13 117 118 57 179 152 33

Creating own functions

functionName <- function(x, y){

# what do we do with 'x' and 'y'}For example, a function computing mean of x:mymean <- function(x){

sum(x) / length(x)}mymean( c(1,2,3) )## [1] 2

Functions useful for tables

cbind, rbind Create matrices by "gluing" vectors or matricesrow-wise or column-wise

rowSums, colSums, rowMeans, colMeans row and column sumsand means

prop.table tables of proportions

Function apply

Apply any function by row/column/layer of a matrix or arraytab## g5b## 1 2 3 4 5 6 7## mężczyzna 28 109 84 52 166 161 23## kobieta 13 117 118 57 179 152 33apply(tab, 2, sum)## 1 2 3 4 5 6 7## 41 226 202 109 345 313 56

Using apply

Calculate percentages of responses 1-3, 4, 5-7fun <- function(r){

v <- c( agree=sum(r[1:3]), dontknow=r[4], disagree=sum(r[5:7]) )v / sum(v) * 100

}apply(tab, 1, fun)#### mężczyzna kobieta## agree 35.474 37.07## dontknow.4 8.347 8.52## disagree 56.180 54.41

