Data management isu

23

description

Data management isu. By Sarah Clark. Thesis:. Competition to get into a Post-Secondary institution has greatly increased and the change in the school curriculum has, and will continue to, affect the graduates of 2003. Survey Facts. 54 % are females, 46 % are males - PowerPoint PPT Presentation

Transcript of Data management isu

Page 1: Data management isu
Page 2: Data management isu

Competition to get into a Post-

Secondary institution has greatly

increased and the change in the

school curriculum has, and will

continue to, affect the graduates

of 2003.

Page 3: Data management isu

54 % are females, 46 % are males

54 % are in Grade 12, 46 % are in OAC

The majority of students surveyed want to go to University

College is the second highest choice

More females want to go to University than males

Most people applied to only 3 Universities

Of those people, the majority were males - Females typically applied to more

than 3 Universities

Page 4: Data management isu

75 % of those getting 90 - 99% are females

75 % of those who have a 90 - 99% average are Gr. 12’s

48 % of the students who have an 80 - 89 % average are OAC’s

Everyone who wants to go to college next year has an average of 70 to 79 %

All students with an average higher than 90 %, and most of those with an 80 - 89 % average,

want to attend University

Most OAC’s fall into this group

Page 5: Data management isu

Sampling Technique:

My survey was basically based on a convenience level since I asked the students in my classes if they would respond to my survey.

Bias:

Since I received most of my data from this survey through my classes it has a bit of bias. This is because a large number of the students are taking University courses. It is also based on OAC ’s and Gr.12 ’s only. This could be classified as a Measurement bias since this method underestimated some characteristics of the population. Therefore, the results seemed to lean towards those who are interested in University and not so much those who may be working instead of attending a post-secondary institution.

Page 6: Data management isu

Frequency Distribution Tableand Weighted Means

Medians and Modes

Standard Deviations

Z-scores

Percentiles

Page 7: Data management isu

Weighted Mean = 67.5 %

0-49 % 23 24.5

50-59 % 30 54.5

60-69 % 68 64.5

70-79 % 79 74.5

80-89 % 51 84.5

90-99 % 6 94.5

Marks Freq. Mid Pt.0-49 % 16 24.5

50-59 % 24 54.5

60-69 % 49 64.5

70-79 % 63 74.5

80-89 % 62 84.5

90-99 % 15 94.5

Marks Freq. Mid Pt.

GRADE 12 OAC

Weighted Mean = 70.8 %

Page 8: Data management isu

GRADE 12

Median:

257 / 2 = 129th position

= 70 - 79 % Mode:

Most students are part of the 70 - 79 % range in both grades. In Grade 12 there are 79 students (or 30.8 %) in this range and in OAC there are 63 students (or 27.5 %) with this average.

* MMR stats. from January 2003

OAC

Median:

229 / 2 = 115th position

= 70 - 79 %

Page 9: Data management isu

Grade 12 = 16.6 OAC = 16.7

This shows that the Grade 12’s averages are slightly less spread than the OAC marks.

According to the Binomial Distribution graph, 68 % of my data should lie within one standard deviation of my mean.

Grade 12: OAC:

68.5 +/- 16.6 70.8 +/- 16.7

= 51.9 % to 85.1 % = 54.1 % to 87.5 %

Therefore 68 % of the students under each grade fall into these ranges

Page 10: Data management isu

Student A

z = 78 - 67.5

16.6

= 0.4311

Scenario:

A Grade 12 student (“Student A”) and an OAC student (“Student B”) both receive a final average mark of 78 % in Math. Their grade averages are 67.5 % and 70.8 % and the standard deviations are 16.6 and 16.7.

Student B

z = 78 - 70.8

16.7

= 0.6325

Therefore, this shows that Student A actually had a better score.

Page 11: Data management isu

In order to receive the 2002 MMR Scholarship last year a student had to be in the 75th percentile or higher. But due to the Double Cohort this year, students must now be in the 80th percentile or above to get the award for 2003.

Scenario:

Emma received a score of 30 last year and won the award. Based on the matrix below, would she have still earned it if she was in the Double Cohort this year?

2 5 12 15 19 23 27 29 33 39

4 7 13 16 20 24 27 29 33 40

4 8 13 18 21 26 28 30 35 41

4 9 14 18 21 27 29 32 38 41

MATRIX Scores forYear 2003(Double Cohort):

Page 12: Data management isu

Solution:

Percentile = (# of scores below x) + 0.5 (# of scores = to x) x 100

total # of scores

= 30 + 0.5 (1) x 100

40

= 30.5 x 100

40

= 0.7625

= 77th percentileTherefore, Emma is in the 77th percentile based on the information in the Double Cohort year. Due to the increased competition, she does not qualify for the Scholarship this year.

Page 13: Data management isu

Correlation Coefficient

Classifying Linear Correlations

Non-Linear Regressions

Cause and Effect

Venn Diagram

Page 14: Data management isu

Affect of Homework Hrs on Mark

y = 73.941e0.0187x

R2 = 0.2337

0

20

40

60

80

100

0 2 4 6 8 10

Homework (hrs)

Mar

k (%

)

The correlation coefficient was calculated as 0.484 in Excel (or by taking the square- root of R squared.

This number means that there is a moderate and positive linear correlation between

the number of hours spent on homework and the student’s avg. mark (between 0.33 and 0.67). Therefore, “Y” increases as “X” increases.

Page 15: Data management isu

Negative LinearCorrelation

Positive LinearCorrelation

Strong Moderate Weak Weak Moderate Strong

Perfect Perfect

-1 - 0.67 - 0.33 0 0.33 0.67 1

Correlation Coefficient “r”

Page 16: Data management isu

Double Cohort Affect on University Applications

y = 7608x2 - 3E+07x + 3E+10

R2 = 0.8612

0

100000

200000

300000

400000

500000

600000

1992 1994 1996 1998 2000 2002 2004

Year#

of A

pp

licat

ion

s

This graph shows the affect of the Double Cohort on the # of applications to University

The curve-of-best-fit shown is a Polynomial Regression

I chose this one because its R-squared value was closest to 1 (which means that it ismore accurate in terms of finding the relationship).

With this information we can predict the number of applications for 2004

Page 17: Data management isu

• Both graphs have a “Cause and Effect” relationship.

• Graph A (Homework Hours vs. Marks) shows this Cause-and-Effect relationship because, generally speaking, the more hours you spend doing homework, the better your mark

• Graph B shows a Common-Cause Factor because as the population grows over the years, the more applications will be sent in with the growing number of students.

Outliers:

• The Double Cohort (the jump in 2003 in Graph B) can be said to be an outlier or an ‘extraneous variable’ because it does not fit with the rest of the data and may skew it

Page 18: Data management isu

RESULTS:

19 students are in Athletics

7 are on the Student Council

8 participate in School Clubs

2 are involved in both the Athletics and the Student Council

1 student is involved with the Student Council and School Club

2 are part of the Athletics and a School Club

1 student does all three

50 students took this survey. The results are shown below.

Construct a Venn Diagram to show the relationships.

Page 19: Data management isu

14 3

2

2

11

5

22

Athletics

School Clubs

Student Council

Solution:

This shows that 56 % of the students surveyed are involved in the school in some way. 10 % participate in two activities and only 1 person (2 %) are engaged in all three.

Page 20: Data management isu

Scenario:

Queen’s University is selecting 5 people for their President’s Scholarship. They are choosing from an eligible group of 4 Grade 12’s and 5 OAC’s.

a) How many ways can they do this?

* Assumption: there are no restrictions and order does not matter *

9 C 5 = 126 ways

b) How many ways can they do this by choosing at least 3 Gr. 12’s?

Ways = (4 C 3 x 5 C 2) + (4 C 4 x 5 C 1) = 40 + 5 = 45

Page 21: Data management isu

Scenario:

Scott applied to 3 Colleges. The probability of a student like him getting into College in 2003 is 75 %. What is the probability of him being accepted to at least one?X = {0,1,2,3}

* Assumption: this is a “success”/ “failure” scenario

P (x) = (n C x) (p^x) (q^n-x)

P (0) = 0.02 P (1) = 0.14 P (2) = 0.42 P (3) = 0.42

P (x > 0) = P(1) + P(2) + P(3)

= 98 %

Page 22: Data management isu

There is increased competition, especially due to the New Curriculum (the Double Cohort)

Why there may be differences in the marks of the 2 grades:

11 % of Gr.12’s and 23 % of OAC’s are not involved in anything (in or out of school)

55 % of OAC’s and 39 % of Gr. 12’s work and/or volunteer for 13 hrs or more per week

Only 4 % of Grade 12’s don’t work or volunteer while this is true for 14 % of the OAC’s

There are more OAC’s (55 %) than Grade 12’s (43 %) who spend 6 or more hours doing leisure activities a week

Page 23: Data management isu