GNU Guix: the functional GNU/Linux distro that's a Scheme ...
Statistics analysis with GNU PSPP161.246.38.75/download/rms/chap10_pspp.pdf ·...
Transcript of Statistics analysis with GNU PSPP161.246.38.75/download/rms/chap10_pspp.pdf ·...
Statistics analysis with GNU PSPP
Asst.Prof.Dr.Supakit Nootyaskool
Information Technology, KMITL
Version: 3
PSPP
• Statistics analysis tool • Supporting input data file from SPSS • Syntax and menu similarity with SPSS.
– Cannot create data graph or chart • Use gnuplot software
• Free license under GPLv3 conditions • Run on various operating systems
– Linux (OpenBSD, NetBSD, FreeBSD) – Mac – Window
• www.gnuorg/software/pspp
Processing data
• Data from the questionnaire
• Data conversion
• Data analysis and discussion
• Interpret result
• Writing summary result
Questionnaire
Data conversion and data entering
PSPP: Data View & Variable View
Data type and data size
Frequency analysis and plotting histogram chart
Skewness ค่าความเบ้ Kurtosis ค่าความโด่ง S.E.mean ค่าความคลาดเคลื่อนมาตราฐานของค่าเฉล่ีย
Output
Descriptive: Crosstab
Descriptive: Crosstab
• Pearson chi-squared is a statistical test telling a difference between the sets.
• for large data set and unpaired data • PSPP set confidential at 95%
df = The degree of freedom The degree of freedom is an value giving by the number of observation minus the number of sample n = 2 3 3 2 1, total number of items = 5 we sample 4 df = 5 – 4 = 2
Comparative Mean
Difficult interpret?
• Age/Status – 30.5year old in regular study
• Sex/Status – 1.45 sex in regular study
• 1 = man, 2 = woman
Transform: Compute
PSPP:Part2
Asst.Prof. Dr.Supakit Nootyaskool
Information Technology, KMITL
Variable in Research
• Independent variable ตวัแปรต้น, ตวัแปรอิสระ
• Dependent variable ตวัแปรตาม: changing the
variable value depends (effect from) on independent variable.
Statistic
Descriptive
• freq.
• mean
• mode
• median
• variant
• standard deviation
Inference
• one-sample t-test
• independent sample t-test
• paired-sample t-test
• one-way ANOVA
• chi-square test
• correlation analysis
• regression analysis
Example1: snack foods
• A company that produce snack foods uses a machine to package by each of bags weight 454 g.
• The quality-assurance (QA) takes a random sample of 24 bags.
• Data: 465, 456, 438, 454, 447, 449, 442, 449, 446, 447, 468, 433, 454, 463, 450, 446, 447, 456, 452, 444, 447, 456, 456, 435
Descriptive statistic
• Data 465, 456, 438, 454, 447, 449, 442, 449, 446, 447, 468, 433, 454, 463, 450, 446, 447, 456, 452, 444, 447, 456, 456, 435
• Analysis
– Freq.
– Mean, Mode, Median
– Variant
– Standard deviation
T-test
• T-test is significance analysis by focus in the correlation coefficient of data.
• T-test uses to check the significant between two data sets.
• Uses both
– Testing independent of data
– Testing dependent of data
Inference
One group (constant)
Two group
Independent group
dependent group
More than two group
Correlation
Qualitative
Quantitative
Factor of dependent
variable
One-sample T test
Five steps for testing hypothesis
1. State null (H0) and alternate (Ha) hypotheses
2. Select a level of significant, traditionally,
level at 0.05 (95%) for consumer research project
level at 0.01 (99%) for quality assurance
level at 0.1 (10%) for political polling
Null hypothesis Does not reject H0 Reject H0
H0 is TRUE Correct decision Type I: Error
H0 is FALSE Type II: Error Correct decision
Null hypothesis Does not reject H0 เป็นโรคกระเพาะ
Reject H0 ไม่เป็นโรคกระเพาะ
H0 is TRUE เป็นจริง
Correct decision Peter ได้ยารักษา
Type I: Error Peter ไม่เป็นโรคกระเพาะ
แตห่มอให้ยารักษา
H0 is FALSE เป็นเทจ็
Type II: Error Peter เป็นโรคกระเพาะแต่หมอบอกว่าไม่เป็น ไม่ได้ยา
รักษา
Correct decision Peter ไม่เป็น
H0 สมมตุิฐานวา่ Peterเป็นโรคกระเพาะอาหารอกัเสบ
Null hypothesis Does not reject H0 ต้องมีรถไฟความเร็วสูง
Reject H0 ไม่จ าเป็นต้องมี
H0 is TRUE เป็นจริง
Correct decision ได้สร้างรถไฟ
Type I: Error จริงๆไม่ต้องสร้างแตผ่ลว่าสร้าง
ก็ต้องท า
H0 is FALSE เป็นเทจ็
Type II: Error อยากมีและจ าเป็นด้วยแตก็่ไม่
สร้าง
Correct decision ไม่สร้างรถไฟ ซึง่ไม่จ าเป็นจริงๆ
H0 สมมตุิฐานวา่เราต้องมีรถไฟความเร็วสงู
Five steps for testing hypothesis
3. Select the test statistic
z-test
𝑧 =𝑋 − 𝜇
𝜎/ 𝑛
t-test
Xbar = Sampling distribution Mu = normal distribution with mean Sigma = a standard deviation
Five steps for testing hypothesis
4. Formulate the decision rule
consideration z value with the critical value
Do not reject H0
Region of rejection
Critical value
Hypothesis test for one population mean
Null hypothesis Alternative hypothesis
H0: μ = u0
H0: μ >= u0
H0: μ <= u0
Ha: μ <> u0
Ha: μ < u0
Ha: μ > u0
u0 : constant
Read H-sub zero or Null hypothesis
Read H-sub one or Alternate hypothesis
KEY OF Difference between H0 and Ha
• Null hypothesis – main hypothesis – normal condition or situation
• Alternative hypothesis – difference way – researcher thinking
Example Electrical supply at 220v and tester think the electrical not 220v H0: u = 220v Ha: u <> 220v
Hypothesis test for one population mean
Null hypothesis Alternative hypothesis
H0: μ = u0
H0: μ >= u0
H0: μ <= u0
Ha: μ <> u0
Ha: μ < u0
Ha: μ > u0
u0 : constant
Example2: Weight of snack food
• H0: μ = 454 grams
– Meaning, the packaging machine is worked accuracy.
• Ha: μ <> 454 grams
– Meaning, the packaging machine is not worked accuracy.
• The significant level
– alpha = 0.05
Population mean (Mu) equal to 454grams
significant levels at 95%
Example2: Weight of snack food
• H0: μ = 454 grams
– Meaning, the packaging machine is worked accuracy.
• Ha: μ <> 454 grams
– Meaning, the packaging machine is not worked accuracy.
• The significant level
– alpha = 0.05
alpha = 0.05 p = 0.033 we found [alpha > p] = True ; 0.05 > 0.033 ,so that reject H0. Conclusion: The packaging machine do not properly to pack at 454grams at significant level 0.05
Example3: Calcium levels
• A nutritionist thinks the average person with income below the poverty level gets less than 800mg of calcium.
• Sample of 18 poverty peoples • Data
– 686, 433, 743, 647, 734, 641, 993, 620, 574, 634, 850, 858, 992, 775, 1113, 672, 879, 609
• Question: the data provide sufficient evidence to
conclude that the mean calcium intake of all people with income below the poverty level is less than 800mg?
poverty (n) ความยากจน intake (n) ปริมาณท่ีบริโภค
Example3: Calcium levels
• H0: μ >= 800mg
• Ha: μ < 800mg
• Significant levels
– alpha = 0.05
• data
– 686, 433, 743, 647, 734, 641, 993, 620, 574, 634, 850, 858, 992, 775, 1113, 672, 879, 609
• H0: μ >= 800mg; Ha: u < 800mg • alpha = 0.05 • p = 0.212 /2 (two way) = 0.106 • [alpha > p] = False ; 0.05 > 0.106 , so that not reject H0
• Conclusion: The mean calcium intake of the poverty people
is not less than 800mg at the 0.05 significance level.
Inference
One group (constant)
Two group,
Two samples
Independent group
dependent group
More than two group
Correlation
Qualitative
Quantitative
Factor of dependent
variable
Independent sample T-test
T-test: Two groups
Null hypothesis Alternative hypothesis
H0: μ1 = u2
H0: μ1 >= u2
H0: μ2 <= u2
Ha: μ1 <> μ2
Ha: μ1 < μ2
Ha: μ1 > μ2
Example4: Hospital costs
• Sample the costs per day between public hospital and private hospital
• public hospital
– 633, 616, 659, 535, 666, 675, 524, 746, 585, 748, 696, 609
• private hospital
– 790, 587, 997, 735, 852, 686, 839, 545, 724, 554, 889, 797, 722, 484, 579
Levene’s test for equality
• H0: u1 = u2; Ha: u1 <> u2
• alpha = 0.05
• p = 0.31/2 = 0.0155
• [alpha>p] = True, so that reject H0
Equal variance not assumed • H0: u1 >= u2; Ha: u1 < u2 • alpha = 0.05 • p = 0.086/2 = 0.043 • [alpha > p] = True , so that reject H0 • Conclude: average cost of public hospital is lower
than private hospital at 0.05 significant level.
Inference
One group (constant)
Two group
Independent group
dependent group
More than two group
Correlation
Qualitative
Quantitative
Factor of dependent
variable
Paired-sample T test
Two dependent group
• Mean value of two groups are dependent or having relation. For example to apply:
– Examination: Pre-test/Post-test
– Applying something: Before/After
– Similarity between someone or something: A/B
Example 5: running
• An exercise physiologist wants to determine whether a certain type of running program will reduce heat rates.
• Sample 15 people and keep data before running and after one year doing exercise.
Person Before After
1 68 67
2 76 77
3 74 74
4 71 74
5 71 69
6 72 70
7 75 71
8 83 77
Person Before After
9 75 71
10 74 74
11 76 73
12 77 68
13 79 71
14 75 72
15 75 77
Example 5: running
• H0: u1 <= u2; Ha: u1 > u2
• u1: the heart rate of before-variable
• u2: the heart rate of after-variable
• The significant level
– alpha = 0.05
• p = 0.018/2 = 0.009
• [alpha > p] = True, so that reject H0
• Conclude: the running program will reduce heart rate at 0.05 significance level.
Inference
One group (constant)
Two group
Independent group
dependent group
More than two group
Correlation
Qualitative
Quantitative
Factor of dependent
variable
ANOVA (ANalysis Of Variance)
Acceptance region/ Rejection region
Example 6: Bearing Vibration
• A hard-disk company tests vibration of bearing for installation in the hard-disk. The bearing from five brands, sampling 6 items, there are result
• The company studying, are bearing difference ?
Brand1 Brand2 Brand3 Brand4 Brand5
13.1 16.3 13.7 15.7 13.5
15 15.7 13.9 13.7 13.4
14 17.2 12.4 14.4 13.2
14.4 14.9 13.8 16 12.7
14 14.4 14.9 13.9 13.4
11.6 17.2 13.3 14.7 12.3
Example 6: Bearing Vibration
• Hypothesis
– H0: u1 = u2 = u3 = u4 = u5
– Ha: u1 <> u2 <> u3 <> u4 <> u5
• Significant level
– Alpha = 0.05
1) [F > FcriticalValue] = true 2) [alpha > p] = true So, Reject H0
Critical Value of F
• http://vassarstats.net/textbook/apx_d.html
• http://www.danielsoper.com/statcalc3/calc.aspx?id=4
Inference
One group (constant)
Two group
Independent group
dependent group
More than two group
Correlation
Qualitative
Quantitative
Factor of dependent
variable
Pearson correlation Spearman correlation
Regression analysis
Reference
• น.ท.หญิง วชัราพร เชยสวุรรณ์, “t-test”, เอกสาร slide, http://www.nmd.go.th/document/ppt/research/t_test2.ppt
• “การวิเคราะห์ข้อมลูทางสถิติเพ่ือการวิจยั ด้วย SPSS”, สาขาวิชาคณิตศาสตร์และเทคโนโลยี ม.เทคโนโลยีราชมงคลสวุรรณภมูิ, เอกสารสอน slide, http://www.rdi.rmutsb.ac.th/2011/download/spss.ppt
• Wipa Sae-Sia, “Analysis of Variance: ANOVA การวิเคราะห์ความแปรปรวณ” เอกสาร slide, http://hsmi.psu.ac.th/upload/forum/anova_ancova.ppt
• ฉตัรศิริ ปิยะพิมลสิทธ์ิ, “ การใช้ SPSS เพ่ือการวิเคราะห์ข้อมลู”, 2548, http://www.watpon.com/spss
• Douglas A. Lind, William G Marchal, Samuel A. Wathen, “Basic Statistics for Business and Economics”, McGraw-Hill international, 2012
Summary
• T-test
• Z-test
• H0
• Ha
• Error type I , II
• Significant level