2017 probstat - 01 introduction - pdf - · PDF file1000families, and a survey is to be ......

16
1/18/2017 1 @jimlecturer wp.me/4sCVe jimlecturer 752A4C6B 01 Introduction prepared by jimmyhasugian

Transcript of 2017 probstat - 01 introduction - pdf - · PDF file1000families, and a survey is to be ......

Page 1: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

1

@jimlecturer

wp.me/4sCVejimlecturer

752A4C6B

01 ‐ Introduction

prepared by jimmyhasugian

Page 2: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

2

Probability & Statistics are deeply connected because all statistical statements are at bottom statements about probability.

Probability vs. Statistics

Probability

• Logically self‐contained• A few rules for computing• One correct answer

Statistics

• Messier and more of an art • Get experimental data & try to draw probabilistic conclusions• No single correct answer

What is the probability of getting exactly 3 heads in 5 times of tossing a fair coin?

Probability vs. Statistics

Probability example

Statistics example

You have a coin of unknown provenance. Then you toss it 5 times and count the number of heads. Let’s say you count 3 heads. Your job as statistician is to draw a conclusion (inference) from this data.

Different statistician might draw different conclusion

Page 3: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

3

Two schools of statistics (sometimes conflicting)

Frequentist vs. Bayesian Interpretation

Different interpretation of the meaning of probability

Probability measures the frequency of 

various outcomes of an experiment

Probability is an abstract concept that measures a state of 

knowledge or a degreeof belief in a given 

proposition

Relationship between Probability & Statistics

deductive reasoning

inductive reasoning

Mid Test

Final Test

Collection of allindividuals or individual items of a particular type 

Collection of observations

Page 4: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

4

Simple Random Sampling implies that any particular sample of a specified sample size has the same chance of being selected as any other sample of the same size.

Sampling Procedures

The number of elements in the sample

Population

Sample

A sample is to be chosen to answer certain questions regardingpolitical preferences in a certain state in the United States. The sample involves 1000 families, and a survey is to be conducted.

Example

Random sampling is not used. All or nearly all of the 1000 families chosen live in an urban setting.

It is believed that political preferences in rural areas differ from those in urban areas. In other words, the sample drawn actually confined the population and thus the inferences need to be confined to the “limited population,”. 

If, indeed, the inferences need to be made about the state as a whole, the sample of size 1000 described here is often referred to as a biased sample

Page 5: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

5

Contohhttp://www.republika.co.id/berita/nasional/jabodetabek‐nasional/16/12/20/oihcox361‐ini‐hasil‐survei‐terakhir‐5‐lembaga‐riset‐di‐pilkada‐dki

PoltrackingHanta Yuda7‐17 November

Indikator Politik IndonesiaBurhanuddin Muhtadi15‐22 November

Charta Politika7‐24 November

Lingkaran Survei IndonesiaDenny JA3‐8 Desember

Lembaga Survei IndonesiaDodi Ambardi3‐11 Desember

29.66%

18.9%

14.9%

15.7%

17.8%

Simple Random Sampling is not always appropriate!Alternative approach is used depends on the complexity of the problem.

Sampling Procedures

The sampling units are not homogenous and naturally divide themselves into non‐overlapping groups that are homogenous.

Page 6: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

6

The concept of randomness or random assignment plays a huge role in the area of experimental design.

Treatment or treatment combinations becomes the populationsto be studied or compared in some sense

Experimental Design

There is considerable variability due to experimental unit

It is very important

A study conducted at the Virginia Polytechnic Institute andState University on the development of a relationship between the roots of trees and the action of a fungus. Minerals are transferred from the fungus to the trees and sugars from the trees to the fungus. Two samples of 10 northern red oak seedlings were planted in a greenhouse, one containing seedlings treated with nitrogen and the other containing seedlings with no nitrogen. All other environmental conditionswere held constant. All seedlings contained the fungus Pisolithustinctorus. The purpose of the experiment is to determine if the use of nitrogen has an influence on the growth of the roots.

Example

Two treatment combinations Two separate populations

Page 7: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

7

It is necessary to quantify the nature of sample

Experimental Design

Center of Location of the data

Variabilityin the data

Descriptive Statistics

To provide the analyst with some quantitative value of wherethe center, or some other location, of data is located.

Measures of Location

The mean simply the numerical average

Page 8: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

8

To reflect the central tendency of the sample in such a way that it is uninfluenced by extreme values or outliers

Measures of Location

Contoh

Data : 1.7, 2.2, 3.9, 3.11, and 14.7 5.12 3.9

There are several other methods of quantifying the center of location of the data in the sample

Measures of Location

by “trimming away” a certain percent of both the largest and the smallest set of values

arranged in increasing order

Page 9: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

9

ExampleThe stem weights in grams were recorded after the end of 140 days

Measures of location in a sample do not provide a propersummary of the nature of a data set.

Measures of Variability

Process Variability

Measures of Spread or Variability

Page 10: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

10

Measures of Variability

Degree of Freedom

Which Measures are more important?

Population Mean

Population Variance

Page 11: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

11

Example

Compute the sample variance and sample standard deviation.

Answer

Page 12: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

12

Graphical Diagnostics

Scatter Plot

Stem‐and‐Leaf Plot

HistogramBox‐and‐Whisker Plot

Scatter‐Plot

Page 13: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

13

Combination of tabular and graphic display

Stem‐and‐Leaf Plot

Page 14: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

14

Dividing each class frequency by the total number of observations

Histogram

Histogram

Page 15: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

15

symmetric

This plot encloses the interquartile range of the data in a boxthat has the median displayed within. For reasonably large samples, the display shows center of location, variability,and the degree of asymmetry

Box‐and‐Whisker Plot OR Box Plot

Page 16: 2017 probstat - 01 introduction - pdf -   · PDF file1000families, and a survey is to be ... pilkada‐dki Poltracking Hanta Yuda ... 2017_probstat - 01 introduction - pdf.pptx

1/18/2017

16

outlier data

outlier data