P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A...

88
0 rtment of Higher Education as a Private Higher Education Institution under the Higher Education Act, 1997. Registration Certificate No. 2000/HE07/008 AC U L T Y O F B U S I N E S S , EC O N O M I C S & A N A G EM EN T S C IEN C ES Q U A L I F I C A T I O N T I T L E : B A C H EL O R O F C O M M ER C E L E A R N E R G U I D E S : M A R K E T I N G M A N A G E M E N T 5 1 1 ( 1 S T S E M E S T E R ) PREPARED ON BEHALF OF NI N G & B U S I N E S S C O L L E G E (P T Y ) L T D U TH O R : D r . L a w r e n c e L e k h a n y a I T O R : Mr . S i m b r a s h e M a g w a g w a C U L T Y HE A D : P r o f . R o s h M a h a r a j C o p y r i g h t © 2 0 1 3 T r a i n i n g & B u s i n e s s C o l l e g e (P t y ) L t d e g i s t r a t i o n N u m b e r : 2 0 0 0 / 0 0 0 7 5 7 / 0 7 rved; no part of this publication may be reproduced in r by any means, including photocopying machines, out the written permission of the Institution. BUSINESS ADMINISTRATION, MANAGEMENT & COMMERCIAL SCIENCES BUSINESS STATISTICS 621 Year 2 Semester 1

Transcript of P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A...

Page 1: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

0

rtment of Higher Education as a Private Higher Education Institution under the Higher Education Act,

1997. Registration Certificate No. 2000/HE07/008

AC U L T Y O F B U S I N E S S ,

EC O N O M I C S &

A N A G EM EN T S C IEN C ES

Q U A L I F I C A T I O N T I T L E :

B A C H EL O R O F C O M M ER C E

L E A R N E R G U I D E

S : M A R K E T I N G M A N A G E M E N T 5 1 1 ( 1

S T

S E M E S T E R )

PREPARED ON BEHALF OF

NI N G & B U S I N E S S C O L L E G E (P T Y ) L T D

U TH O R : D r . L a w r e n c e L e k h a n y a

I T O R : Mr . S i m b r a s h e M a g w a g w a

C U L T Y HE A D : P r o f . R o s h M a h a r a j

C o p y r i g h t © 2 0 1 3

T r a i n i n g & B u s i n e s s C o l l e g e (P t y ) L t d

e g i s t r a t i o n N u m b e r : 2 0 0 0 / 0 0 0 7 5 7 / 0 7

rved; no part of this publication may be reproduced in

r by any means, including photocopying machines,

out the written permission of the Institution.

BUSINESS ADMINISTRATION, MANAGEMENT & COMMERCIAL SCIENCES

BUSINESS STATISTICS 621

Year 2 Semester 1

Page 2: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

1

Previously

BUSINESS ADMINISTRATION, MANAGEMENT

& COMMERCIAL SCIENCES

LEARNER GUIDE

MODULE: BUSINESS STATISTICS 621

(1ST

SEMESTER)

Copyright © 2016 Richfield Graduate Institute of Technology (Pty) Ltd

Registration Number: 2000/000757/07 All rights reserved; no part of this publication may be reproduced in any form or by any means, including photocopying

machines, without the written permission of the Institution.

Page 3: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

2

ARNER GUIDE

FACULTY OF BUSINESS, ECONOMICS & MANAGEMENT SCIENCES

LEARNER GUIDE

MODULE: BUSINESS STATISTICS 621 (1ST SEMESTER)

PREPARED ON BEHALF OF

RICHFIELD GRADUATE INSTITUTE OF TECHNOLOGY

Copyright © 2016 PC Training & Business College (Pty) Ltd Registration Number: 2000/000757/07

All rights reserved; no part of this publication may be reproduced in any form or by any means, including photocopying machines, without the written permission of the Institution.

Page 4: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

3

TABLE OF CONTENTS

TOPICS

Section A: Preface

1. Welcome 3

2. Title of Modules 4

3. Purpose of Module 4

4. Learning Outcomes 5

5. Method of Study 5

6. Lectures and Tutorials 5

7. Notices 5

8. Prescribed & Recommended Material 5

9. Assessment & Key Concepts in Assignments and Examinations 6

10. Specimen Assignment Cover Sheet 10

11. Work Readiness Programme 11

12. Work Integrated Learning 12

Section B:

TOPIC 1: INTRODUCTION TO DESCRIPTIVE STATISTICS

1.1 What Is Statistics? 17

1. 2 Descriptive Statistics 17

1.3. Inferential Statistics 18

1.4 Variables 19

1.5 Parameters 20

1.6 Summation Notation 21

1.7 Measurement Scales 23

TOPIC 2:DESCRIBING UNIVARIATE DATA

2.1 Central Tendency 27

2.2 Mean 30

2.3 Median 30

2.4 Mode 31

2.5. Spread 31

2.6 Range 32

2.7 Semi-Interquartile Range 32

2.8 Variance 33

2.9 Standard Deviation 34

2.10 Shape The Distribution 36

2.11 Skewness 36

2.12 Kurtosis 37

2.13 Types Of Graphs 38

TOPIC 3: CORRELATION AND SIMPLE LINEAR REGRESSION ANALYSIS 3.1. Scatter Plots 44

3.2. Introduction To Pearson's Correlation 45

3.3 Regression Analysis 55

TOPIC 4: INTRODUCTION TO PROBALITY

4.1 Simple Probability 54

4.2 Conditional Probability 55

4.3 Probability Of A And B 56

Page 5: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

4

1. WELCOME

Welcome to the Faculty of Business, Economics& Management Sciences at PC Training & Business College. We trust you will find the contents and learning outcomes of this module both interesting and insightful as you begin your academic journey and eventually your career in the business world. This section of the study guide is intended to orientate you to the module before the

commencement of formal lectures.

The following lecturers will focus on the study units described.

4.4 Probability Of A Or B 56

TOPIC 5: DISCRETE PROBABILITY DISTRIBUTION

5.1 Permutations And Combinations 60

5.2 Binomial Probability Distribution 62

5.3 The Poisson Distribution 64

Assessment questions 66

TOPIC 6: CONTINUOUS PROBABILITY DISTRIBUTION

6.1 What Is A Normal Distribution 67

6.2 The Standard Normal Distribution 68

6.3 Converting To Percentiles And Back 69

6.4 Area Under Portions Of The Curve 71

Assessment Questions 73

TOPIC 7: ADDENDUM 621 (A): REVIEW QUESTIONS 78

TOPIC 8: ADDENDUM 621 (B): TYPICAL EXAMINATION QUESTIONS

80

SECTION A: WELCOME & ORIENTATION

Study unit 1: Orientation Programme

Introducing academic staff to the students by academic head. Introduction of institution policies.

Lecture 1

Study unit 2: Orientation of students to Library and Students Facilities

Introducing students to physical structures

Issuing of foundation learner guides and necessary learning material

Lecture 2

Study unit 3: Distribution and Orientation of Business Statistics Learner Guides, Textbooks and Prescribed Materials

Lecture 3

Study unit 4: Discussion on the Objectives and Outcomes of Business Statistics 621

Lecture 4

Page 6: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

5

2. TITLE OF MODULES, COURSE, CODE, NQF LEVEL, CREDITS & MODE OF DELIVERY

Semester 1

Title of Module Business Statistics 621

Code BUS_621

NQF level 6

Credits 10

Mode of delivery Contact/Distance

3. PURPOSE OF THE MODULES These introductory courses covers the concepts and techniques concerning explanatory data analysis, frequency distributions, central tendency and variation, probability, sampling, inference, regression and correlation. Students will be exposed to these topics and how each applies to and can be used in the business environment. Students will master problem solving both manual computations and statistical software

4. LEARNING OUTCOMES On completion of these modules the student will be able to:

Appreciate the role of statistics in management decision making.

develop an intuitive understanding of the techniques by giving an explanation for

each method and interpretation of the solutions

Have a general understanding of basic probability concepts

Understand the statistical measures which condense and describe the characteristics

of raw data

5. METHOD OF STUDY The sections that have to be studied are indicated under each topic. These form the basis for tests, assignments and examination. To be able to do the activities and assignments for this module, and to achieve the learning outcomes and ultimately to be successful in the tests and examination, you will need an in-depth understanding of the content of these sections in the learning guide and prescribed book. In order to master the learning material, you must accept responsibility for your own studies. Learning is not the same as memorizing. You are expected to show that you understand and are able to apply the information. Use will also be made of lectures, tutorials, case studies and group discussions to present this module.

Study unit 5: Orientation and guidelines to completing Assignments

Review and Recap of Study units 1-4 Lecture 5

Section B: Business Statistics621 (1st Semester)

Page 7: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

6

6. LECTURES AND TUTORIALS Students must refer to the notice boards on their respective campuses for details of the lecture and tutorial time tables. The lecturer assigned to the module will also inform you of the number of lecture periods and tutorials allocated to a particular module. Prior preparation is required for each lecture and tutorial. Students are encouraged to actively participate in lectures and tutorials in order to ensure success in tests, assignments and examinations.

7. NOTICES All information pertaining to this module such as tests dates, lecture and tutorial time tables, assignments, examinations etc. will be displayed on the notice board located on your campus. Students must check the notice board on a daily basis. Should you require any clarity, please consult your lecturer, or programme manager, or administrator on your respective campus.

8. PRESCRIBED & RECOMMENDED MATERIAL

8.1 Prescribed Material Wegner, T. 2016. Applied Business Statistics: Methods and Excel-based Applications.

4th edition. Cape Town: Juta & Co. Ltd.

Business statistics 621 has a well balanced approach in that it is structured such that it not only informs and educates you about the theoretical back-ground required in the business world, but also has a powerful practical element / component. Our practical syllabus follows strongly in line with that of strong management principles and standards currently employed by many enterprises today. 8.2 Recommended Material Willemse, I. and Nyelisani, P. 2015. Statistical Methods and Calculation Skills. 4th ed. Cape Town: Juta & Company Ltd. 8.3 Independent Research: The student is encouraged to undertake independent research with emphasis on the Presentation and interpretation of the data collected. 8.4 Library Infrastructure The following services are available to you:

Each campus keeps a limited quantity of the recommended reading titles and a larger variety of similar titles which you may borrow. Please note that students are required to purchase the prescribed materials.

Arrangements have been made with municipal, state and other libraries to stock our recommended reading and similar titles. You may use these on their premises or borrow them if available. It is your responsibility to safe keeps all library books.

PCT&BC has also allocated one library period per week as to assist you with your formal research under professional supervision.

PCT&BC has dedicated electronic libraries for use by its students. The computers laboratories, when not in use for academic purposes, may also be used for research purposes. Booking is essential for all electronic library usage.

9. ASSESSMENT

Page 8: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

7

Final Assessment for this module will comprise two CA tests, an assignment and an examination. Your lecturer will inform you of the dates, times and the venues for each of these. You may also refer to the notice board on your campus or the Academic Calendar which is displayed in all lecture rooms. 9.1 CA Tests There are two compulsory tests for each module (in each semester). 9.2 Assignment There is one compulsory assignment for each module in each semester. Your lecturer will inform you of the Assessment questions at the commencement of this module. It is therefore necessary to study on an ongoing basis. 9.3 Examination There is one two hour examination for each module. Make sure that you diarize the correct date, time and venue. The examinations FACULTY will notify you of your results once all administrative matters are cleared and fees are paid up.

The examination may consist of multiple choice questions, short questions and essay type questions. This requires you to be thoroughly prepared as all the content matter of lectures, tutorials, all references to the prescribed text and any other additional documentation/reference materials is examinable in both your tests and the examinations. The examination FACULTY will make available to you the details of the examination (date, time and venue) in due course. You must be seated in the examination room 15 minutes before the commencement of the examination. If you arrive late, you will not be allowed any extra time. Your learner registration card must be in your possession at all times.

9.4 Final Assessment

The final assessment for this module will be weighted as follows: Continuous Assessment Test 1

Continuous Assessment Test 2 40%

Assignment 1 Examination 60% Total 100% 9.5 Key Concepts in Assignments and Examinations In assignment and examination questions you will notice certain key concepts (i.e.

words/verbs) which tell you what is expected of you. For example, you may be asked in a

question to list, describe, illustrate, demonstrate, compare, construct, relate, criticize,

recommend or design particular information/aspects/factors /situations. To help you to

Page 9: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

8

know exactly what these key concepts or verbs mean so that you will know exactly what is

expected of you, we present the following taxonomy by Bloom, explaining the concepts and

stating the level of cognitive thinking that theses refer to.

Competence Skills Demonstrated

Knowledge

observation and recall of information knowledge of dates, events, places knowledge of major ideas mastery of subject matter Question Cues list, define, tell, describe, identify, show, label, collect, examine, tabulate, quote, name, who, when, where, etc.

Comprehension

understanding information grasp meaning translate knowledge into new context interpret facts, compare, contrast order, group, infer causes predict consequences Question Cues summarize, describe, interpret, contrast, predict, associate, distinguish, estimate, differentiate, discuss, extend

Application

use information use methods, concepts, theories in new situations solve problems using required skills or knowledge Questions Cues apply, demonstrate, calculate, complete, illustrate, show, solve, examine, modify, relate, change, classify, experiment, discover

Analysis

seeing patterns organization of parts recognition of hidden meanings identification of components Question Cues analyze, separate, order, explain, connect, classify, arrange, divide, compare, select, explain, infer

Synthesis

use old ideas to create new ones generalize from given facts relate knowledge from several areas predict, draw conclusions Question Cues combine, integrate, modify, rearrange, substitute, plan, create, design, invent, what if?, compose, formulate, prepare, generalize, rewrite

Evaluation

compare and discriminate between ideas assess value of theories, presentations make choices based on reasoned argument verify value of evidence recognize subjectivity Question Cues assess, decide, rank, grade, test, measure, recommend, convince,

Page 10: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

9

select, judge, explain, discriminate, support, conclude, compare, summarize

10. Specimen Assignment Cover Sheet

FACULTY OF BUSINESS, ECONOMICS & MANAGEMENT SCIENCE S

BUSINESS STATISTICS 621 ASSIGNMENT COVER SHEET

1ST SEMESTER ASSIGNMENT

Name & Surname: ______________________________ ICAS No: _________________

Qualification: ______________________ Semester: _____

Module Name: __________________________

Specialization: _____________________ Date Submitted: ___________

QUESTION NUMBER MARK ALLOCATION EXAMINER MARKS MODERATOR MARKS

TOTAL

Examiner’s Comments:

Moderator’s Comments:

Signature of Examiner: Signature of Moderator:

The purpose of an assignment is to ensure that the student is able to:

make informed decisions based on data

correctly apply a variety of statistical procedures and tests

know the uses, capabilities and limitations of various statistical procedures

interpret the results of statistical procedures and tests

Page 11: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

10

Instructions and guidelines for writing assignments

1. Use the correct cover page provided by the institution.

2. All essay type assignments must include the following:

2.1 Table of contents

2.2 Introduction

2.3 Main body with subheadings

2.4 Conclusions and recommendations

2.5 Bibliography

3. The length of the entire assignment must have minimum of 5 pages, preferably typed

with font size 12

3.1 The quality of work submitted is more important than the number of assigned pages.

4. Copying is a serious offence which attracts a severe penalty and must be avoided at all

costs. If any learner transgresses this rule, the lecturer will retain the assignments and

ask the affected students to resubmit a new assignment which will be capped at 50%.

5. Use the Harvard referencing method.

ASSESSMENT CRITERIA

When the final mark is calculated the following criteria must be taken into account:

1. READING AND KNOWLEDGE OF SUBJECT MATTER

Wide reading and comprehensive knowledge in the application of theory 2. UNDERSTANDING, ANALYSIS AND ARGUMENT

Complete and perceptive awareness of issues and clear grasp of their wider significance. Clear evidence of independent thought and ability to defend a position logically and convincingly.

3. ORGANISATION AND PRESENTATION

Careful thought given to arrangement and development of material and argument.

Good English with appropriate referencing and comprehensive bibliography.

ASSIGNMENT GUIDELINES

The purpose of an assignment is to ensure that the student is able to:

Interpret, convert and evaluate text.

Have sound understanding of key fields viz principles and theories, rules, concepts and

awareness of how to cognate areas.

Solve unfamiliar problems using correct procedures and corrective actions.

Investigate and critically analyse information and report thereof.

Present information using Information Technology.

Present and communicate information reliably and coherently.

Develop information retrieval skills.

Use methods of enquiry and research in a disciplined field.

Page 12: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

11

ASSESSMENT CRITERIA

When the final Mark is allocated the above criteria must be taken into account

A. Content- Relevance: Has the learner Answered the Question

B. Research (A minimum of “TEN SOURCES” is recommended) Reference , books, Internet,

Newspapers, Text Books

C. Presentation : Introduction, Body, Conclusion, Paragraphs, Neatness, Integration,

Grammar / Spelling, Page Numbering, Diagrams, Tables, Graphs, Bibliography

NB: All Assignments are compulsory as they form part of continuous assessment that

counts towards the final mark

11. WORK READINESS PROGRAMME (WRP) In order to prepare students for the world of work, a series of interventions over and above the formal curriculum, are concurrently implemented to prepare students. These include:

Soft skills

Employment skills

Life skills

End –User Computing (if not included in your curriculum) The illustration below outlines some of the key concepts for Work Readiness that will be included in your timetable. It is in your interest to attend these workshops, complete the Work Readiness Log Book and prepare for the Working World.

WORK

READINESS

PROGRAMM

E

SOFT SKILLS Time Management

Working in Teams

Problem Solving Skills

Attitude & Goal Setting

Etiquettes & Ethics

Communication Skills

LIFE SKILLS Manage Personal Finance

Driving Skills

Basic Life Support &

First Aid

Entrepreneurial skills

Counseling skills

EMPLOYMENT SKILLS CV Writing

Interview Skills

Presentation Skills

Employer / Employee Relationship

End User Computing

Email & E-Commerce

Spread Sheets

Data base

Presentation

Office Word

Page 13: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

12

12. WORK INTEGRATED LEARNING (WIL) Work Integrated Learning forms a core component of the curriculum for the completion of this programme. All modules for this qualification will be assessed in an integrated manner towards the end of the programme or after completion of all other modules. Prerequisites for placement with employers will include:

Completion of all tests & assignment

Success in examination

Payment of all arrear fees

Return of library books, etc.

Completion of the Work Readiness Programme. Students will be fully inducted on the Work Integrated Learning Module, the Workbooks & assessment requirements before placement with employers. The partners in Work Readiness Programme (WRP) include:

SECTION B

Page 14: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

13

LEARNER GUIDE MODULE: BUSINESS STATISTICS 621, 1st SEMESTER

TOPIC 1: INTRODUCTION TO DESCRIPTIVE STATISTICS

TOPIC 2: DESCRIBING UNIVARIATE DATA

TOPIC 3: CORRELATION SIMPLE LINEAR REGRESSION ANALYSIS

TOPIC 5: INTRODUCTION TO PROBALITY

TOPIC 6: CONTINUOUS PROBABILITY DISTRIBUTION

ADDENDUM 621 (A): REVIEW QUESTIONS

ADDENDUM 621 (B): TYPICAL EXAMINATION QUESTIONS

TOPIC 1: INTRODUCTION TO DESCRIPTIVE STATISTICS 1.2 What Is Statistics?

1.3 Descriptive Statistics

1.4 Inferential Statistics

1.5 Variables Lecture 7

1.6 Parameters

1.7 Summation Notation Lecture 8

1.8 Measurement Scales

TOPIC 2:DESCRIBING UNIVARIATE DATA 2.1 Central Tendency

Lecture 9-20

2.2 Mean

2.3 Median

2.4 Mode

2.5 Spread

2.6 Range

2.7 Semi-Interquartile Range

2.8 Variance

2.9 Standard Deviation

2.10 Shape The Distribution

2.11 Skewness

2.12 Kurtosis

2.13 Types Of Graphs

2.16 Assessment questions

2.17 Central Tendency

2.18 Mean

TOPIC 3: CORRELATION SIMPLE LINEAR REGRESSION ANALYSIS

Page 15: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

14

3.1 3.1. Scatter Plots Lecture 21-25

3.2 3.2. Introduction To Pearson's Correlation

3.3 3.3 Regression Analysis

Assessment questions

TOPIC 4: INTRODUCTION TO PROBALITY

4.1 Simple Probability Lecture 32 - 35

4.2 Conditional Probability

4.3 Probability Of A And B

4.4 Probability Of A Or B

TOPIC 5: DISCRETE PROBABILITY DISTRIBUTION

5.1 Permutations And Combinations Lecture 36-37

5.2 Binomial Probability Distribution

5.3 The Poisson Distribution

TOPIC 6: CONTINUOUS PROBABILITY DISTRIBUTION 6.1 What Is A Normal Distribution

Lecture 38- 41

6.2 The Standard Normal Distribution

6.3 Converting To Percentiles And Back

6.4 Area Under Portions Of The Curve

TOPIC 7: ADDENDUM 621 (A): REVIEW QUESTIONS TOPIC 8: ADDENDUM 621 (B): TYPICAL EXAMINATION QUESTIONS

The following are guide icons that will be used throughout this learner guide:

Page 16: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

15

Icon Description

Learning Outcomes

Study

Read

Writing Activity

Think Point

Research

Glossary

Key Points

Review Question

Page 17: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

16

Case Study

Bright Idea

Problem(s)

Multimedia Resource

Web Resource

Page 18: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

17

TOPIC 1

1. INTRODUCTION TO DESCRIPTIVE STATISTICS

Learning Outcomes:

In this topic you will learn about the term ‘statistics’ and the use of it.

Knowledge about two types of statistics namely descriptive and inferential.

An ability to use variables and parameters. You will learn about different measuring scales nominal, ordinal, interval, and ratio.

1.1 WHAT IS STATISTICS? The word "statistics" is used in several different senses. In the broadest sense, "statistics" refers to a range of techniques and procedures for analyzing data, interpreting data, displaying data, and making decisions based on data. This is what courses in "statistics" generally cover. In a second usage, a "statistic" is defined as a numerical quantity (such as the mean) calculated from a sample. Such statistics are used to estimate parameters. The term "statistics" sometimes refers to calculated quantities regardless of whether or not they are from a sample. For example, one might ask about a baseball player's statistics and be referring to his or her batting average, runs batted in, number of home runs, etc. Or, "government statistics" can refer to any numerical indexes calculated by a governmental agency. Although the different meanings of “statistics” have the potential for confusion, a careful consideration of the context in which the word is used should make its intended meaning clear. 1. 2 DESCRIPTIVE STATISTICS One important use of statistics is to summarize a collection of data in a clear and understandable way. For example, assume a psychologist gave a personality test measuring shyness to all 2500 students attending a small college. How might these measurements be summarized? There are two basic methods: numerical and graphical. Using the numerical approach one might compute statistics such as the mean and standard deviation.

Page 19: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

18

These statistics convey information about the average degree of shyness and the degree to which people differ in shyness. Using the graphical approach one might create a stem and leaf display and a box plot. These plots contain detailed information about the distribution of shyness scores. Graphical methods are better suited than numerical methods for identifying patterns in the data. Numerical approaches are more precise and objective. Since the numerical and graphical approaches complement each other, it is wise to use both but not at the same time for the same data. 1.3. INFERENTIAL STATISTICS

Inferential statistics are used to draw inferences about a population from a sample. Consider an experiment in which 10 subjects who performed a task after 24 hours of sleep deprivation scored 12 points lower than 10 subjects who performed after a normal night's sleep. Is the difference real or could it be due to chance? How much larger could the real difference be than the 12 points found in the sample? These are the types of questions answered by inferential statistics. There are two main methods used in inferential statistics: estimation and hypothesis testing. In estimation, the sample is used to estimate a parameter and a confidence interval about the estimate is constructed. In the most common use of hypothesis testing, a "straw man" null hypothesis is put forward and it is determined whether the data are strong enough to reject it. For the sleep deprivation study, the null hypothesis would be that sleep deprivation has no effect on performance. (Population: A population consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as all males between the ages of 15 and 18. Some populations are only hypothetical. Consider an experimenter interested in the possible effectiveness of a new method of teaching reading. He or she might define a population as the reading achievement scores that would result if all six year olds in the US were taught with this new method. The population is hypothetical in the sense that it does not exist a group of students who have been taught using the new method; the population consists of the scores that would be obtained if they were taught with this method. The distribution of a population can be described by several parameters such as the mean and standard deviation. Estimates of these parameters taken from a sample are called statistics. Sample: A sample is a subset of a population. Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available.)

Page 20: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

19

1.4 VARIABLES A variable is any measured characteristic or attribute that differs for different subjects. For example, if the weight of 30 subjects were measured, then weight would be a variable. Quantitative and Qualitative Variables can be quantitative or qualitative. Qualitative variables are sometimes called "categorical variables”. Quantitative variables are measured on an ordinal, interval, or ratio scale; qualitative variables are measured on a nominal scale. If five-year old subjects were asked to name their favourite colour, then the variable would be qualitative. If the time it took them to respond were measured, then the variable would be quantitative. Independent and Dependent variable When an experiment is conducted, some variables are manipulated by the experimenter and others are measured from the subjects. The former variables are called "independent variables"; or "factors," the latter are called "dependent variables" or "dependent measures." For example, consider a hypothetical experiment on the effect of drinking alcohol on reaction time: Subjects drank water, one beer, three beers, or six beers and then had their reaction times to the onset of a stimulus measured. The independent variable would be the number of beers drunk (0, 1, 3, or 6) and the dependent variable would be reaction time. Continuous and Discrete variable Some variables (such as reaction time) are measured on a continuous scale. There are an infinite number of possible values these variables can take on. Other variables can only take on a limited number of values. For example, if a dependent variable were a subject's rating on a five- point scale where only the values 1, 2, 3, 4, and 5 were allowed, then only five possible values could occur. Such variables are called "discrete" variables. Nominal: Nominal measurement consists of assigning items to groups or categories. No quantitative information is conveyed and no ordering of the items is implied. Nominal scales are therefore qualitative rather than quantitative. Religious preference, race, and sex are all examples of nominal scales. Frequency distributions are usually used to analyze data measured on a nominal scale. The main statistic computed is the mode. Variables measured on a nominal scale are often referred to as categorical or qualitative variables. Ordinal: Measurements with ordinal scales are ordered in the sense that higher numbers represent higher values. However, the intervals between the numbers are not necessarily equal. For example, on a five-point rating scale measuring attitudes toward gun control, the difference between a rating of 2 and a rating of 3 may not represent the same difference as the difference between a rating of 4 and a rating of 5. There is no "true" zero point for ordinal scales since the zero point is chosen arbitrarily. The lowest point on the rating scale in the example was arbitrarily chosen to be 1. It could just as well have been 0 or -5. Interval: On interval measurement scales, one unit on the scale represents the same magnitude on the trait or characteristic being measured across the whole range of the scale. For example, if anxiety were measured on an interval scale, then a difference between a score of 10 and a score of 11 would represent the same difference in anxiety, as would a

Page 21: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

20

difference between a score of 50 and a score of 51. Interval scales do not have a "true" zero point, however, and therefore it is not possible to make statements about how many times higher one score is than another. For the anxiety scale, it would not be valid to say that a person with a score of 30 was twice as anxious as a person with a score of 15. True interval measurement is somewhere between rare and nonexistent in the behavioural sciences. No interval-level scale of anxiety such as the one described in the example actually exists. A good example of an interval scale is the Fahrenheit scale for temperature. Equal differences on this scale represent equal differences in temperature, but a temperature of 30 degrees is not twice as warm as one of 15 degrees. Ratio: Ratio scales are like interval scales except they have true zero points. A good example is the Kelvin scale of temperature. This scale has an absolute zero. Thus, a temperature of 300 Kelvin is twice as high as a temperature of 150 Kelvin. 1.5 PARAMETERS A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency. Greek letters are used to designate parameters. At the bottom of this page are shown several parameters of great importance in statistical analyses and the Greek symbol that represents each one. Parameters are rarely known and are usually estimated by statistics computed in samples. To the right of each Greek symbol is the symbol for the associated statistic used to estimate it from a sample.

Quantity Parameter Statistic

Mean μ x Standard deviation σ s

Proportion π p

Correlation ρ r

Central tendency: Measures of central tendency are measures of the location of the middle or the centre of a distribution. The definition of "middle" or "centre" is purposely left somewhat vague so that the term "central tendency" can refer to a wide variety of measures. The mean is the most commonly used measure of central tendency. The following measures of central tendency are discussed in this text:

Mean

Median

Mode

1.6 SUMMATION NOTATION The Greek letter Σ (a capital sigma) is used to designate summation. For example, suppose an experimenter measured the performance of four subjects on a memory task. Subject 1's score will be referred to as X 1 , Subject 2's as X 2 , and so on. The scores are shown below:

Page 22: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

21

The way to use the summation sign to indicate the sum of all four X's is:

This notation is read as follows: Sum the values of X from X1 through X4. The index i (shown just under the Σsign) indicates which values of X are to be summed. The index i takes on values beginning with the value to the right of the "=" sign (1 in this case) and continues sequentially until it reaches the value above the Σ sign (4 in this case). Therefore ‘i’ takes on the values 1, 2, 3, and 4 and the values of X1, X2, X3, and X4 are summed (7 + 6 + 5 + 8 = 26). In order to make formulas more general, variables can be used with the summation notation. For example,

means to sum up values of X from 1 to N where N can be any number but usually indicates the sample size. Often an abbreviated form of the summation notation is used. For example, ΣX means to sum all the values of X. When only a subset of the values of X is to be summed then the full version is required. Thus, the sum of all elements of X except the first and the last (the N'th) would be indicated as:

which would be read as the sum of X with i going from 2 to N-1. Some formulas require that each number be squared before the numbers are summed. This is indicated by:

and is equal to 72 + 62 + 52 + 82 = 174. The abbreviated version is simply: ΣX2. It is very important to note that it makes a big difference whether the numbers are squared first and then summed or summed first and then squared. The symbol (ΣX) 2 indicates that the numbers should be summed first and then squared. For the present example, this equals: (7 + 6 + 5 + 8)2 = 262 = 676. This, of course, is quite different from 174.

Sometimes a formula requires that the sum of cross products be computed. For instance, if 3 subjects were each tested twice, they might each have a score on X and on Y.

Page 23: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

22

Subject X Y

1 2 3

2 1 4

3 6 5

The sum of cross products (2 x 3) + (1 x 6) + (4 x 5) = 32 can be represented in summation notation simply as: ΣXY. Basic Theorems The following data will be used to illustrate the theorems:

X Y

3 8

2 3

4 1

Σ(X + Y) = ΣX + ΣY Σ(X + Y) = 11 + 5 + 5 = 21 ΣX = 3 + 2 + 4 = 9 ΣY = 8 + 3 + 1 = 12 ΣX + ΣY = 9 + 12 = 21 ΣaX = aΣX(a is a constant) for an example, let a = 2. ΣaX = (2) (3) + (2) (2) + (2)(4) = 18 a ΣX = (2)(9) = 18

(N is the sample size, in this case, and is the mean which is also equal to 3 in this case.

Σ(X- )2 = (3-3)2 + (2-3)2 + (4-3)2 = 2

ΣX2 = 32 + 22 + 42 = 29 (ΣX)2/N = 92/3 = 27 ΣX2 - (ΣX)2/N = 29 - 27 = 2 1.7 MEASUREMENT SCALES Measurement is the assignment of numbers to objects or events in a systematic fashion. Four levels of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio. There is a relationship between the level of measurement and the appropriateness of various statistical procedures.

Page 24: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

23

For example, it would be silly to compute the mean of nominal measurements. However, the appropriateness of statistical analyses involving means for ordinal level data has been controversial. One position is that data must be measured on an interval or a ratio scale for the computation of means and other statistics to be valid. Therefore, if data are measured on an ordinal scale, the median but not the mean can serve as a measure of central tendency. The arguments on both sides of this issue will be examined in the context of a hypothetical experiment designed to determine whether people prefer to work with colour or with black and white computer displays. Twenty subjects viewed black and white displays and 20 subjects viewed colour displays. Displays were rated on a 7 point scale where a 1 was the lowest rating and a 7 was the highest rating. This rating scale is only an ordinal scale since there is no assurance that the difference between a rating of 1 and a rating of 2 represents the same degree of difference in preference as the difference between a rating of 5 and a rating of 6. The mean rating of the colour display was 5.5 and the mean rating of the black and white display was 3.9. The first question the experimenter would ask is how likely is it that this big a difference between means could have occurred just because of chance factors such as which subjects saw the black and white display and which subjects saw the colour display. Standard methods of statistical inference can answer this question. Assume these methods led to the conclusion that the difference was not due to chance but represented a "real" difference in means. Does the fact that the rating scale was ordinal instead of interval have any implications for the validity of the statistical conclusion that the difference between means was not due to chance? The answer is an unequivocal "NO." There is really no room for argument here. What can be questioned, however, is whether it is worth knowing that the mean rating of color displays is higher than the mean rating for B & W displays. The argument that it is not worth knowing assumes that means of ordinal data are meaningless. Supporting the notion that means of ordinal data are meaningless is the fact that examples (see below) can be made up showing that a difference between means on an ordinal scale can be in the opposite direction of what they would have been if the "true" measurement scale had been used. If means of ordinal data are meaningless, why should anyone care whether the difference between two meaningless quantities (the two means) is due to chance or not. Naturally enough, the answer lies in challenging the proposition that means of ordinal data are meaningless. There are two counter arguments to the example showing that using an ordinal scale can reverse the direction of the difference between means. The first is philosophical and challenges the validity of the notion that there is some unseen "true" measurement scale that is only being approximated by the rating scale. The second counter argument accepts the notion of an underlying scale but considers the examples to be very contrived and unlikely to occur in real data. Measurement scales used in behavioral research are invariably somewhere between ordinal and interval scales. In the preference

Page 25: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

24

experiment, it may not be the case that the difference between the ratings one and two is exactly the same as the difference between five and six, but it is unlikely to be many times larger either. The scale is roughly interval and it is exceedingly unlikely that the means on this scale would favor color displays while the means on the "true" scale would favor the B & W displays. There are some cases where one can validly argue that the use of an ordinal instead of a ratio scale seriously distorts the conclusions. Consider an experiment designed to determine whether 5-year old children are more distractible than 10-year old children. Children of both ages perform a memory task once with and once without distraction. The means are given below:

Distraction No Distraction

5-yr 10-yr

6 12

3 8

It looks as though the 10-year olds are more distractible since distraction cost them 4 points but only cost the 5-year olds 3 points. However, it might be that a change from 3 to 6 represents a larger difference than a change from 8 to 12. Consider that the performance of 5-year olds dropped 50% from distraction but the performance of 10-year olds dropped only 33%. Which age group is "really" more distractible? Unfortunately, there is no clearly right or wrong answer. If proportional change is considered, then 5-year olds are more distractible; if the amount of change is considered then 10-year olds are more distractible. Keep in mind that statistical conclusions are not affected by the choice of measurement scale even though the all-important interpretation of these conclusions can be. In this example, a statistical test could validly rule out chance as an explanation of the finding that 10-year olds lost more points from distraction than did 5-year olds. However, the statistical test will not reveal whether a greater drop necessarily means 10-year olds are more distractible. So the conclusion that distraction costs 10-year olds more points than it costs 5-year olds is valid. The interpretation depends on measurement issues. In summary, statistical analyses provide conclusions about the numbers entered into them. Relating these conclusions to the substantive research issues depends on the measurement operations. Examples: Assume there were a "true" measurement scale for job satisfaction and that it maps onto a 7-point rating scale as follows: "True scale" 7-point scale 1-5 1 6-40 2 41-42 3 43-75 4 76-90 5 91-94 6 95-100 7

Page 26: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

25

Thus if someone's "true" job satisfaction were 55 he or she would have a rated score of 4. Now consider the following two sets of job satisfaction scores:

Group A Group B

True Scale Rating True Scale Rating

6 6

43 91 95

2 2 4 6 7

5 40 74 90

100

1 2 4 5 7

Mean 48.2 4.2 61.8 3.8

On the "true" scale the mean for Group B is 61.8, which is much higher than the mean for Group A that is 48.2. However on the 7-point rating scale, the mean for B is only 3.8 which is lower than the mean for A of 4.2.

Problems:

1. A teacher wishes to know whether the male in his/her class have more favorable

attitudes toward gun control than do the female. All students in the class are given a questionnaire about gun control and the mean responses of the males and the females are compared. Is this an example of descriptive or inferential statistics?

2. A medical researcher is testing the effectiveness of a new drug for treating Parkinson's disease. Ten subjects with the disease are given the new drug and 10 are given a placebo. Improvement in symptomology is measured. What would be the roles of descriptive and inferential statistics in the analysis of these data?

3. What are the advantages and disadvantages of graphical as opposed to numerical approaches to descriptive statistics?

4. Distinguish between random and stratified sampling?

5. A study is conducted to determine whether people learn better with spaced or massed practice. Subjects volunteer from an introductory psychology class. The first 10 subjects who volunteer are assigned to the massed-practice condition; the next 10 are assigned to the spaced-practice condition. Discuss the consequences and seriousness of each of the following two kinds of non-random sampling: (1) Subjects are not randomly sampled from some specified population and (2) subjects are not randomly assigned to conditions. In general, which type of non-random sampling is more serious?

6. Define independent and dependent variables.

Page 27: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

26

7. Categorize the following variables as being qualitative or quantitative:

-Response time -Rating of job satisfaction -Favorite color -Occupation aspired to -Number of words remembered

8. Specify the level of measurement used for the items in Question 7.

9. Categorize the variables in Question 7 as being continuous or discrete.

10. Are Greek letters used for statistics or for parameters?

11. When would the mean score of a class on a final exam be considered a statistic? When would it be considered a parameter?

12. An experiment is conducted to examine the effect of punishment on learning speed in rats. What are the independent and dependent variables?

13. For the numbers 1, 2, 4, 8 Compute: SX, SX2 and (SX) 2

14. SX = 7 and SX2 = 21. A new variable Y is created by multiplying each X by 3. What are SY and SY2 equal to?

For additional reading on this topic, a student must refer to the recommended text book for Business Statistics [Applied Business

Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 63)

Page 28: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

27

TOPIC 2 _______

2. DESCRIBING UNIVARIATE DATA

Learning Outcomes:

In this topic you will learn what is central tendency as well as its measures.

Knowledge about shapes, graphs, ranges etc.

2.1 CENTRAL TENDENCY Measures of central tendency are measures of the location of the middle or the centre of a distribution. The definition of "middle" or "centre" is purposely left somewhat vague so that the term "central tendency" can refer to a wide variety of measures. The mean is the most commonly used measure of central tendency. The following measures of central tendency are discussed in this text:

Mean

Median

Mode

2.2 MEAN Arithmetic Mean The arithmetic mean is what is commonly called the average. When the word "mean" is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores. The formula in summation notation is: μ = ΣX/N where μ is the population mean and N is the number of scores. If the scores are from a sample, then the symbol M refers to the mean and n refers to the sample size. The formula for M is the same as the formula for μ. The mean is a good measure of central tendency for roughly symmetric distributions but can be misleading in skewed distributions since it can be greatly influenced by extreme scores. Therefore, other statistics such as the median may be more informative for distributions such as reaction time or family income that are frequently much skewed. The sum of squared deviations of scores from their mean is lower than their squared deviations from any other number. For normal distributions, the mean is the most efficient and therefore the least subject to sample fluctuations of all measures of central tendency. The formal definition of the arithmetic mean is µ = E[X] where μ is the population mean of the variable X and E[X] is the expected value of X.

Page 29: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

28

Geometric Mean The geometric mean is the nth root of the product of the scores. Thus, the geometric mean of the scores: 1, 2, 3, and 10 is the fourth root of 1 x 2 x 3 x 10 which is the fourth root of 60 which equals 2.78. The formula can be written as: Geometric mean = ΠX where ΠX means to take the product of all the values of X.

nnxxxxGM ...321

meanGeometric

Example

78.2

10321

...

4

321

nnxxxxmeanGeometric

The geometric mean can also be computed by: 1. Taking the logarithm of each number 2. Computing the arithmetic mean of the logarithms 3. Raising the base used to take the logarithms to the arithmetic mean.

The example on the next page shows an example of this method using natural logarithms.

X Ln(X)

1 0

2 0.693147

3 1.098612

10 2.302585

Geometric mean = 2.78 Arithmetic mean = 1.024.

EXP[1.024] = 2.78

The base of natural logarithms is 2.718. The expression: EXP [1.024] means that 2.718 is raised to the 1.024th power. Ln (X) is the natural log of X. Naturally; you get the same result using logs base 10 as shown below.

X Log(X)

1 0.0000

2 0.30103

3 0.47712

10 1.00000

Geometric mean = 2.78 Arithmetic mean = 0.44454. 10.44454 = 2.78

Page 30: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

29

If any one of the scores is zero then the geometric mean is zero. The geometric mean does

not make sense if any scores are less than zero.

The geometric mean is less affected by extreme values than is the arithmetic mean and is

useful as a measure of central tendency for some positively skewed distributions.

The geometric mean is an appropriate measure to use for averaging rates. For example,

consider a stock portfolio that began with a value of $1,000 and had annual returns of 13%,

22%, 12%, -5%, and -13%. The table below shows the value after each of the five years.

Year Return Value

1 13% 1,130

2 22% 1,379

3 12% 1,544

4 -5% 1,467

5 -13% 1,276

The question is how to compute annual rate of return? The answer is to compute the geometric mean of the returns. Instead of using the percents, each return is represented as a multiplier indicating how much higher the value is after the year. This multiplier is 1.13 for a 13% return and 0.95 for a 5% loss. The multipliers for this example are 1.13, 1.22, 1.12, 0.95, and 0.87. The geometric mean of these multipliers is 1.05. Therefore, the average annual rate of return is 5%. The following table shows how a portfolio gaining 5% a year would end up with the same value ($1,276) as the one shown above.

Year Return Value

1 5% 1,050

2 5% 1,103

3 5% 1,158

4 5% 1,216

5 5% 1,276

Harmonic Mean The harmonic mean is used to take the mean of sample sizes. If there are k samples each of size n, then the harmonic mean is defined as:

Page 31: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

30

For the numbers 1, 2, 3, and 10, the harmonic mean is:

= 2.069. This is less than the geometric mean of 2.78 and the arithmetic mean of 4. Sample fluctuations: Sampling fluctuation refers to the extent to which statistic takes on different values with different samples. That is, it refers to how much the statistic's value fluctuates from sample to sample. A statistic whose value fluctuates greatly from sample to sample is highly subject to sampling fluctuation. 2.3 MEDIAN The median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions. The median income is usually more informative than the mean income, for example. The sum of the absolute deviations of each number from the median is lower than is the sum of absolute deviations from any other number. The mean, median, and mode are equal in symmetric distributions. The mean is higher than the median in positively skewed distributions and lower than the median in negatively skewed distributions Computation of Median When there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4. Remember to sort out the data values in ascending order first then calculate the median. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5. 2.4 MODE The mode is the most frequently occurring score in a distribution and is used as a measure of central tendency. The advantage of the mode as a measure of central tendency is that its meaning is obvious. Further, it is the only measure of central tendency that can be used with nominal data. The mode is greatly subject to sample fluctuations and is therefore not recommended to be used as the only measure of central tendency. A further disadvantage of the mode is that many distributions have more than one mode. These distributions are called "multimodal." In a normal distribution, the mean, median, and mode are identical.

Page 32: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

31

Summary:

Of the five measures of central tendency discussed, the mean is by far the most widely used. It takes every score into account, is the most efficient measure of central tendency for normal distributions and is mathematically tractable making it possible for statisticians to develop statistical procedures for drawing inferences about means.

On the other hand, the mean is not appropriate for highly skewed distributions and is less efficient than other measures of central tendency when extreme scores are possible. The geometric mean is a viable alternative if all the scores are positive and the distribution has a positive skew.

The median is useful because its meaning is clear and it is more efficient than the mean in highly-skewed distributions. However, it ignores many scores and is generally less efficient than the mean, the trimean, and trimmed means.

The mode can be informative but should almost never be used as the only measure of central tendency since it is highly susceptible to sampling fluctuations.

2.5. SPREAD A variable's spread is the degree scores on the variable differ from each other. If every score on the variable were about equal, the variable would have very little spread. There are many measures of spread. The distributions on the right side of this page have the same mean but differ in spread: The distribution on the bottom is more spread out. Variability and dispersion are synonyms for spread.

Range

Semi-Interquartile Range

Variance

Standard Deviation

Page 33: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

32

2.6 RANGE The range is the simplest measure of spread or dispersion: It is equal to the difference between the largest and the smallest values. The range can be a useful measure of spread because it is so easily understood. However, it is very sensitive to extreme scores since it is based on only two values. The range should almost never be used as the only measure of spread, but can be informative if used as a supplement to other measures of spread such as the standard deviation or semi-interquartile range. Dispersion: A variable's dispersion is the degree to which scores on the variable differ from each other. If every score on the variable were about equal, the variable would have very little dispersion. There are many measures of dispersion. Example The range of the numbers 1, 2, 4, 6,12,15,19, 26 = 26 -1 = 25 2.7 SEMI-INTERQUARTILE RANGE The semi-Interquartile range is a measure of spread or dispersion. It is computed as one half the differences between the 75th percentile [often called (Q3)] and the 25th percentile (Q1). The formula for semi-interquartile range is therefore: (Q3-Q1)/2. Since half the scores in a distribution lie between Q3 and Q1, the semi-interquartile range is 1/2 the distance needed to cover 1/2 the scores. In a symmetric distribution, an interval stretching from one semi-interquartile range below the median to one semi-interquartile above the median will contain 1/2 of the scores. However, this will not be true for a skewed distribution. The semi-interquartile range is little affected by extreme scores, so it is a good measure of spread for skewed distributions. However, it is more subject to sampling fluctuation in normal distributions than is the standard deviation and therefore not often used for data that are approximately normally distributed. Dispersion: A variable's dispersion is the degree to which scores on the variable differ from each other. If every score on the variable were about equal, the variable would have very little dispersion. There are many measures of dispersion. 2.8 VARIANCE The variance is a measure of how spread out a distribution is. It is computed as the average squared deviation of each number from its mean. For example, for the numbers 1, 2, and 3, the mean is 2 and the variance is:

σ2 = .

Page 34: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

33

The formula (in summation notation) for the variance in a population is

where μ is the mean and N is the number of scores. When the variance is computed in a sample, the statistic

(where is the mean of the sample) can be used. S2 is a biased estimate of σ2, however. By

far the most common formula for computing variance in a sample is:

Which gives an unbiased estimate of σ2? Since samples are usually used to estimate parameters, s2 is the most commonly used measure of variance. Calculating the variance is an important part of many statistical applications and analyses. Bias: A statistic is biased if, in the long run, it consistently over or underestimates the parameter it is estimating. More technically it is biased if its expected value is not equal to the parameter. A stopwatch that is a little bit fast gives biased estimates of elapsed time. Bias in this sense is different from the notion of a biased sample. A statistic is positively biased if it tends to overestimate the parameter; a statistic is negatively biased if it tends to underestimate the parameter. An unbiased statistic is not necessarily an accurate statistic. If a statistic is sometimes much too high and sometimes much too low, it can still be unbiased. It would be very imprecise, however. A slightly biased statistic that systematically results in very small overestimates of a parameter could be quite efficient. Biased sample: A biased sample is one in which the method used to create the sample results in samples that are systematically different from the population. For instance, consider a research project on attitudes toward sex. Collecting the data by publishing a questionnaire in a magazine and asking people to fill it out and send it in would produce a biased sample. People interested enough to spend their time and energy filling out and sending in the questionnaire are likely to have different attitudes toward sex than those not taking the time to fill out the questionnaire. It is important to realize that it is the method used to create the sample not the actual make up of the sample itself that defines the bias. A random sample that is very different from the population is not biased: it is by definition not systematically different from the population. It is randomly different.)

Page 35: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

34

2.9 STANDARD DEVIATION The formula for the standard deviation is very simple: it is the square root of the variance. It is the most commonly used measure of spread. An important attribute of the standard deviation as a measure of spread is that if the mean and standard deviation of a normal distribution are known, it is possible to compute the percentile rank associated with any given score. In a normal distribution, about 68% of the scores are within one standard deviation of the mean and about 95% of the scores are within two standards deviations of the mean. The standard deviation has proven to be an extremely useful measure of spread in part because it is mathematically tractable. Many formulas in inferential statistics use the standard deviation. Although less sensitive to extreme scores than the range, the standard deviation is more sensitive than the semi-interquartile range. Thus, the standard deviation should be supplemented by the semi-interquartile range when the possibility of extreme scores is present. Standard Deviation as a Measure of Risk The standard deviation is often used by investors to measure the risk of a stock or a stock portfolio. The basic idea is that the standard deviation is a measure of volatility: the more a stock's returns vary from the stock's average return, the more volatile the stock. Consider the following two stock portfolios and their respective returns (in per cent) over the last six months. Both portfolios end up increasing in value from $1,000 to $1,058. However, they clearly differ in volatility. Portfolio A's monthly returns range from -1.5% to 3% whereas Portfolio B's range from -9% to 12%. The standard deviation of the returns is a better measure of volatility than the range because it takes all the values into account. The standard deviation of the six returns for Portfolio A is 1.52; for Portfolio B it is 7.24.

Page 36: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

35

Summary:

The standard deviation is by far the most widely used measure of spread. It takes every score into account, has extremely useful properties when used with a normal distribution, and is tractable mathematically and, therefore; it appears in many formulas in inferential statistics.

The standard deviation is not a good measure of spread in highly-skewed distributions and should be supplemented in those cases by the semi-interquartile range.

The range is a useful statistic to know, but it cannot stand alone as a measure of spread since it takes into account only two scores.

The semi-interquartile range is rarely used as a measure of spread, in part because it is not very mathematically tractable. However, it is influenced less by extreme scores than the standard deviation, is less subject to sampling fluctuations in highly-skewed distributions, and has a good intuitive meaning. It should be used to supplement the standard deviation in most cases.

Page 37: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

36

2.10 SHAPE OF THE DISTRIBUTION The concept of the shape of the distribution refers to the shape of a probability distribution and it most often arises in questions of finding an appropriate distribution to use to model the statistical properties of a population, given a sample from that population. The shape of a distribution may be considered either descriptively, using terms such as "J-shaped", or numerically, using quantitative measures such as skewness and kurtosis. 2.11 SKEWNESS A distribution is skewed if one of its tails is longer than the other. The first distribution shown has a positive skew. This means that it has a long tail in the positive direction. The distribution below it has a negative skew since it has a long tail in the negative direction. Finally, the third distribution is symmetric and has no skew. Distributions with positive skew are sometimes called "skewed to the right" whereas distributions with negative skew are called "skewed to the left."

Distributions with positive skew are more common than distributions with negative skews. One example is the distribution of income. Most people make under $40,000 a year, but some make quite a bit more with a small number making many millions of dollars per year. The positive tail therefore extends out quite a long way whereas the negative tail stops at zero. For a more psychological example, a distribution with a positive skew typically results if the time it takes to make a response is measured. The longest response times are usually much longer than typical response times whereas the shortest response times are seldom much less than the typical response time. A histogram of the author's performance on a perceptual motor task in which the goal is to move the mouse to and click on a small target as quickly as possible is shown below. The X axis shows times in milliseconds.

Page 38: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

37

Negatively skewed distributions do occur, however. Consider the following frequency polygon of test grades on a statistics test where most students did very well but a

few did poorly. It has a large negative skew.

Skew can be calculated as:

where μ is the mean and σ is the standard deviation. The normal distribution has a skew of 0 since it is a symmetric distribution. As a general rule, the mean is larger than the median in positively skewed distributions and less than the median in negatively skewed distributions. Although counter examples can be found, they are very rare in real data. 2.12 KURTOSIS Kurtosis is based on the size of a distribution's tails. Distributions with relatively large tails are called "leptokurtic"; those with small tails are called "platykurtic”. A distribution with the same kurtosis as the normal distribution is called "mesokurtic”. The following formula can be used to calculate kurtosis:

where σ is the standard deviation. The kurtosis of a normal distribution is 0. The following two distributions have the same variance; approximately the same skew, but differ markedly in kurtosis.

2.13 TYPES OF GRAPHS a. Frequency Polygons A frequency polygon is a graphical display of a frequency table. The intervals are shown on the X-axis and the number of scores in each interval is represented by the height of a point located above the middle of the interval. The points are connected so that together with the X-axis they form a polygon.

Page 39: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

38

A frequency table and a relative frequency polygon for response times in a study on weapons and aggression are shown below. The times are in hundredths of a second.

Lower Limit

Upper Limit Count Cumulative Count

Percent Cumulative Percent

25 30 35 40 45 50

30 35 40 45 50 55

1 4 8

15 3 1

1 5

13 28 31 32

3.12 12.48 24.96 46.80 9.36 3.12

3.12 15.62 40.62 87.50 96.88

100.00

Note: Values in each category are > the lower limit and ≤ to the upper limit.

Frequency polygons can be based on the actual frequencies or the relative frequencies. When based on relative frequencies, the percentage of scores instead of the number of scores in each category is plotted. In a cumulative frequency polygon, the number of scores (or the percentage of scores) up to and including the category in question is plotted. A cumulative frequency polygon is shown below.

b. Histograms A histogram is constructed from a frequency table. The intervals are shown on the X-axis and the number of scores in each interval is represented by the height of a rectangle located above the interval. A histogram of the response times from the dataset Target RT is shown below.

Page 40: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

39

Histogram

The shapes of histograms will vary depending on the choice of the size of the intervals. A bar graph is much like a histogram, differing in that a small distance separates the columns from each other. Bar graphs are commonly used for qualitative variables.

c. Stem and Leaf Displays

A stem and leaf plot is much like a histogram except it portrays a little more information. A stem and leaf plot of the tournament players from the dataset "chess" as well as the data themselves are shown to the right.

The largest value, 85.3, is approximated as 10 x 8 + 5. This is represented in the plot as a stem of 8 and a leaf of 5. It is shown as the "5" in the first line of the plot. Similarly, 80.3 is approximated as 10 x 8 + 0; it has a stem of 8 and a leaf of 0. It is shown as the "0" in the first line of the plot. Depending on the data, each stem is displayed 1, 2, or 5

times. When a stem is displayed only once (as on the plot shown here), the leaves can take on the values from 0-9. When a stem is displayed twice, (as in the example on the right) one stem is associated with the leaves 5-9 and the other stem is associated with the leaves 0-4.

Page 41: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

40

Finally, when a stem is displayed five times, the first has the leaves 8-9, the second 6-7, the third 4-5, and so on. If positive and negative numbers are present, +0 and -0 are used as stems as they are in the plot to the right. A stem of -0 and a leaf of 7 is a value of (-0 x 1) + (-.1 x 7) = -.7. d. Box Plots

A box plot provides an excellent visual summary of many important aspects of a distribution. The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distribution. The median is shown as a line across the box. Therefore 1/4 of the distribution is between this line and the top of the box and 1/4 of the distribution is between this line and the bottom of the box. The "H-spread" is defined as the difference between the hinges and a "step" is defined as 1.5 times the H-spread.

Inner fences are 1 step beyond the hinges. Outer fences are 2 steps beyond the hinges. There are two adjacent values: the largest value below the upper inner fence and the smallest value above the lower inner fence. For the data plotted in the figure, the minimum value is above the lower inner fence and is therefore the lower adjacent value. The maximum value is the inner fences so it is not the upper adjacent value. As shown in the figure, a line is drawn from the upper hinge to the upper adjacent value and from the lower hinge to the lower adjacent value.

Page 42: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

41

Every score between the inner and outer fences is indicated by an "o"; a score beyond the outer fences is indicated by a "*". It is often useful to compare data from two or more groups by viewing box plots from the groups side by side. Plotted are data from Example 2a and Example 2b. The data from 2b are higher; more spread out, and have a positive skew. That the skew is positive can be determined by the fact that the mean is higher than the median and the upper whisker is longer than the lower whisker. Some computer programs present their own variations on box plots. For example, SPSS does not include the mean. JMP distinguishes between "outlier" box plots which are the same as those described here and quantile box plots that show the 10th, 25th, 50th, 75th, and 90th Percentiles.

REVIEW QUESTIONS

1. Which of the frequency polygons has a large positive skew? Which has a large negative skew?

Page 43: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

42

2. Which of the box plots on has a large positive skew? Which has a large negative skew?

3. Make up a dataset of 20 numbers with a positive skew. Use a statistical program to compute the skew and to create a box plot. Is the mean larger than the median as it should be for distributions with a positive skew? What is the value for skew? Plot a frequency polygon with these data. 4. Repeat Problem 3 only this time make the dataset have a negative skew. 5. Make up two data sets that have: (a) the same mean but differ in standard deviations. (b) the same mean but have different medians. (c) the same median but different means. (d) the same semi-interquartile range but differ in standard deviations 6. Assume the variable X has a mean of 10 and a standard deviation of 2. What would be the mean and standard deviation of a new variable (Y) that was created by multiplying each element of X by 5 and then adding 4. 6. The dataset below has data on memory for chess positions for players of three levels of

expertise. The numbers represent the number of pieces correctly from three chess positions. Create side-by-side box plots for these three groups. What can you say about the differences between these groups from the box plots? Consider spread as well as central tendency.

Page 44: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

43

Non players

Beginners

Tournament Players

22.1 22.3 26.2 29.6 31.7 33.5 38.9 39.7 43.2 43.2

32.5 37.1 39.1 40.5 45.5 51.3 52.6 55.7 55.9 57.7

40.1 45.6 51.2 56.4 58.1 71.1 74.9 75.9 80.3 85.3

8. For the numbers 1, 3, 4, 5, an -v)2 is minimized. Find

-v| is minimized. 9. Experiment with the sampling distribution simulation and do the exercises associated with it. 10. What is more likely to have a skewed distribution: time to solve an anagram problem or scores on a vocabulary test?

For additional reading, a student must refer to the recommended text book for Business Statistics [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 70 to page 89)]

For extra problems centred on describing univariate data, students must refer to the prescribed text book for Business Statistics [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 90)]

Page 45: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

44

TOPIC 3 __

3. CORRELATION AND SIMPLE LINEAR REGRESSION ANALYSIS __

Learning Outcomes:

In this topic you will learn about Pearson's Correlation and its computational formula.

Effects of restricted range and linear transformations on Pearson's Correlation

Knowledge on Spearman's rho

3.1. SCATTER PLOTS A scatter plot shows the scores on one variable plotted against scores on a second variable. Below is a plot showing the relationship between grip strength and arm strength for 147 people working at physically-demanding jobs. The data are from a case study in the Rice Virtual Lab in Statistics. The plot shows a very strong but certainly not a perfect relationship between these two variables.

Scatter plots should almost always be constructed when the relationship between two variables is of interest. Statistical summaries are no substitute for a full plot of the data.

Page 46: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

45

3.2. INTRODUCTION TO PEARSON'S CORRELATION

The correlation between two variables reflects the degree to which the variables are related. The most common measure of correlation is the Pearson Product Moment Correlation (called Pearson's correlation for short). When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a sample, it is designated by the letter "r" and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. The scatter plot shown on this page depicts such a relationship. It is a positive relationship because high scores on the X-axis are associated with high scores on the Y-axis.

A correlation of -1 means that there is perfect negative linear relationship between variables. The scatter plot shown to the right depicts a negative relationship. It is a negative relationship because highscores on the X-axis are associated with low scores on the Y-axis. A correlation of 0 means there is no linear relationship between the two variables. The second graph shows a Pearson correlation of 0.

Correlations are rarely if ever 0, 1, or -1. Some real data showing a moderately high correlation is shown as follows.

Page 47: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

46

The scatter plot below shows arm strength as a function of grip strength for 147 people working in physically-demanding jobs. The plot reveals a moderate positive relationship. The value of Pearson's correlation is 0.63. Computing Pearson's correlation coefficient The formula for Pearson's correlation takes on many forms. A commonly used formula is shown on the right. The formula looks a bit complicated, but taken step by step as shown in the numerical example, it is simple. A simpler looking formula can be used if the numbers are converted into z scores:

where zx is the variable X converted into z scores and zy is the variable Y converted into z scores. (Numerical example: X Y 1 2 2 5 3 6

Page 48: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

47

Example values of r

Effect of restricted range on Pearson's Correlation Consider a study that investigated the correlation between arm strength and grip strength for 147 people working in physically-demanding jobs. Do you think the correlation was higher for all 147 workers tested, or for the workers who were above the median in grip strength? The upper portion of the figure below shows that the scatters plot for the entire sample of 147 workers. The lower portion of the figure shows the scatter plot for the 73 workers who scored highest on grip strength. The correlation is 0.63 for the sample of 147 but only 0.47 for the sample of 73.

Page 49: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

48

Whenever a sample has a restricted range of scores, the correlation will be reduced. To take the most extreme example, consider what the correlation between high-school GPA and college GPA would be in a sample where every student had the same high-school GPA. The correlation would necessarily be 0.0. How would you interpret r values? When do we say there is week, moderate or strong relationship exists? Effect of linear transformations on Pearson's Correlation A linear transformation of a variable involves multiplying each value of the variable by one number and then adding a second number. For example, consider the variable X with the following three values: X 2 3 7 One linear transformation of the variable would be to multiply each value by 2 and then to add 5. If the transformed variable is called Y, then Y = 2X+5. The values of Y are: X Y 2 9 3 11 7 19

Page 50: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

49

You can see why it is called a linear transformation by looking at the plot on the right. Let b stand for the coefficient each number is multiplied by and A stand for the constant that is added. Y is a linear transformation of X if Y = bX + A Linear transformations are very common. For example, the Fahrenheit scale of temperature is a linear transformation of the Centigrade scale: F = 1.8 C + 32 A linear transformation is used to convert from tons to pounds: b = 2000 and A = 0 Pounds = 2000 x Tons + 0 Linear transformations have no effect on Pearson's correlation coefficient. Thus, the correlation between height and weight is the same regardless of whether height is measured in inches, feet, centimetres or even miles. This is a very desirable property since, with the exception of ratio scales; choices among measurement scales that are linear transformations of each other are arbitrary. For instance, scores on the Scholastic Aptitude Test (SAT) range from 200-800. It was an arbitrary decision to set 200 to 800 as the range. The test would not be any different if 100 points were subtracted from each score and then each score were multiplied by 3. Scores on the SAT would then range from 300-2100. The Pearson's correlation between SAT and some other variable (such as college grade point average) would not be affected by this linear transformation. Spearman's rho Spearman's rho is a measure of the linear relationship between two variables. It differs from Pearson's correlation only in that the computations are done after the numbers are converted to ranks. When converting to ranks, the smallest value on X becomes a rank of 1, etc. Consider the following X-Y pairs: X Y 7 4 5 7 8 9 9 8

Page 51: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

50

Converting these to ranks would result in the following: X Y 2 1 1 2 3 4 4 3 The first value of X (which was a 7) is converted into a 2 because 7 is the second lowest value of X. The X value of 5 is converted into a 1 since it is the lowest. Spearman's rho can be computed with the formula for Pearson's r using the ranked data. Spearman’s Rho / Rank Correlation Co-Efficient

This is correlation by ranks. You rank the data

Example:

The directors of a company decided to send a random sample of 8 sales persons to a course, which aimed to improve their selling techniques. At the end of the course, the course leader assessed each sales person. Six months after the conclusion of the course, the sales record of each sales person was obtained for that period.

For each sales person, the sales and the course leader’s assessment are given below:

Sales Persons Sales (R 000) Course Assessment

A 38 68

B 52 84

C 42 74

D 35 57

E 70 90

F 62 80

G 46 66

H 44 64

1. Compute Spearman’s rho.

2. Compute the co-efficient of Determination

1. Firstly rank the above given data. Come up with the difference (D) of Rank of X (RX); course assessment, and Rank of Y (RY), which are the sales. Having done this square the figures on the (d) column to come up with the d² column. Compute the ∑d², which is to be substituted into the formula for Spearman’s rho:

Page 52: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

51

R = 1 – 6 Σd² n (n² - 1)

X (Course assessment) Y (Sales) RX RY d d²

A 68 38 4 2 2 4

B 84 52 7 6 1 1

C 74 42 5 3 2 4

D 57 35 1 1 0 0

E 90 70 8 8 0 0

F 80 62 6 7 -1 1

G 66 46 3 5 -2 4

H 64 44 2 4 -2 4

∑ d² = 18 n = 8 Substituting into the formula; 1 - 6 x 18 8(8² - 1) 1 – 0.214285714 = 0.7857 r = 0.79

Co –efficient of Determination = r²

If r = 0.79 r² = 0.79² = 0.62 Usually given in percentages I.e. Co-efficient of Determination = 62 %. MEANING: 62 % of the changes in sales are due to changes in course

Assessment. I.e. 38 % of the changes in sales are due to other things.

3.3 REGRESSION ANALYSIS Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'Criterion Variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.

Page 53: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

52

Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable; for example, correlation does not imply causation. Many techniques for carrying out regression analysis have been developed. Familiar methods such as linear regression and ordinary least squares regression are parametric, in that the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data. Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite-dimensional. The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes testable if a sufficient quantity of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods can give misleading results

SUMMARY

The correlation between two variables reflects the degree to which the variables are related

It ranges from +1 to -1

Pearson’s Correlation is computed by :

Spearman correlation differs from Pearson's correlation only in that the computations

are done after the numbers are converted to ranks

Spearman’s correlation is computed by: R = 1 – 6 Σd² n (n² - 1)

Regression analysis is a statistical process for estimating the relationships among variables

Linear regression is the model of the form: Y = a + bX

Page 54: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

53

ASSESSMENT QUESTIONS 1. Make up 3 datasets each with 10 observations (rows) and 2 variables (columns). One should have a correlation greater than .8; one should have a correlation between .4 and .7; one should have a correlation between -.20 and +.20. 2. Compute Pearson's r and Spearman's rho for the following data: 1,2 5,9 6,8 12,20

For additional knowledge on the correlation and simple

linear regression, students must refer to the prescribed

text book for Business Statistics [Applied Business

Statistics, Methods and Excel-basic applications (3rd

edition) by: Trevor Wegner (page 299)]

Students must in addition, do the activity from the

prescribed text book [Applied Business Statistics,

Methods and Excel-basic applications (3rd edition) by:

Trevor Wegner (page 315)].

Page 55: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

54

TOPIC 4

4. I NTRODUCTION TO PROBABILITY

___________________________________________________________________________

Learning Outcomes:

Knowledge of concepts, terms, and symbols used in probability and the two types of probability, simple and conditional.

In this chapter you will learn about Binomial distribution.

4.1 SIMPLE PROBABILITY

What is the probability that a card drawn at random from a deck of cards will be an ace? Since there are 52 cards in the deck, 4 are aces, the probability is 4/52. In general, the probability of an event is the number of favorable outcomes divided by the total number of possible outcomes. (This assumes the outcomes are all equally likely.) In this case there are four favorable outcomes: (1) the ace of spades, (2) the ace of hearts, (3) the ace of diamonds, and (4) the ace of clubs. Since each of the 52 cards in the deck represents a possible outcome, there are 52 possible outcomes. The same principle can be applied to the problem of determining the probability of obtaining different totals from a pair of dice. As shown below, there are 36 possible outcomes when a pair of dice is thrown.

Page 56: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

55

To calculate the probability that the sum of the two dice will equal 5, calculate the number of outcomes that sum to 5 and divide by the total number of outcomes (36). Since four of the outcomes have a total of 5 (1,4; 2,3; 3,2; 4,1), the probability of the two dice adding up to 5 is 4/36 = 1/9. In like manner, the probability of obtaining a sum of 12 is computed by dividing the number of favourable outcomes (there is only one) by the total number of outcomes (36). The probability is therefore 1/36.

A probability is nothing but the chance that something of interest will happen.

4.2 CONDITIONAL PROBABILITY A conditional probability is the probability of an event given that another event has occurred. For example, what is the probability that the total of two dice will be greater than 8 given that the first die is a 6? This can be computed by considering only outcomes for which the first die is a 6. Then, determine the proportion of these outcomes that total more than 8. All the possible outcomes for two dice are shown in the section on simple probability. There are 6 outcomes for which the first die is a 6, and of these, there are four that total more than 8 (6,3; 6,4; 6,5; 6,6). The probability of a total greater than 8 given that

Page 57: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

56

the first die is 6 is therefore 4/6 = 2/3. More formally, this probability can be written as: p (total>8 | Die 1 = 6) = 2/3. In this equation, the expression to the left of the vertical bar represents the event and the expression to the right of the vertical bar represents the condition. Thus it would be read as "The probability that the total is greater than 8 given that Die 1 is 6 is 2/3." In more abstract form, p (A|B) is the probability of event A given that event B occurred. 4.3 PROBABILITY OF A AND B If A and B are Independent

A and B are two events. If A and B are independent, then the probability that events A and B both occur is: p (A and B) = p (A) x p (B). In other words, the probability of A and B both occurring is the product of the probability of A and the probability of B. What is the probability that a fair coin will come up with heads twice in a row? Two events must occur: a head on the first toss and a head on the second toss. Since the probability of each event is 1/2, the probability of both events is: 1/2 x 1/2 = 1/4. Now consider a similar problem: Someone draws a card at random out of a deck, replaces it, and then draws another card at random. What is the probability that the first card is the ace of clubs and the second card is a club (any club)? Since there is only one ace of clubs in the deck, the probability of the first event is 1/52. Since 13/52 = 1/4 of the deck is composed of clubs, the probability of the second event is 1/4. Therefore, the probability of both events is: 1/52 x 1/4 = 1/208

. If A and B are Not Independent

If A and B are not independent, then the probability of A and B is p (A and B) = p (A) x p (B|A) where p (B|A) is the conditional probability of B given A. If someone draws a card at random from a deck and then, without replacing the first card, draws a second card, what is the probability that both cards will be aces? Event A is that the first card is an ace. Since 4 of the 52 cards are aces, p (A) = 4/52 = 1/13. Given that the first card is an ace, what is the probability that the second card will be an ace as well? Of the 51 remaining cards, 3 are aces. Therefore, p (B|A) = 3/51 = 1/17 and the probability of A and B is: 1/13 x 1/17 = 1/221. Independent: In probability theory, two events are independent if the occurrence of one is unrelated to the probability of the occurrence of the other. Getting heads the second time a fair coin is tossed is independent of getting heads on the first toss. There is simply no valid way to predict the second outcome from knowledge of the first. 4.4 PROBABILITY OF A OR B If events A and B are mutually exclusive (Two events are mutually exclusive if it is not possible for both of them to occur at the same time. For example, if a die is rolled, the event "getting a 1" and the event "getting a 2" are mutually exclusive since it is not possible for the die to be both a one and a two on the same roll. The occurrence of one event "excludes" the possibility of the other event.), then the probability of A or B is simply:

P (A or B) = p (A) + p (B).

Page 58: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

57

What is the probability of rolling a die and getting either a 1 or a 6? Since it is impossible to get both a 1 and a 6, these two events are mutually exclusive. Therefore,

P (1 or 6) = p (1) + p (6) = 1/6 + 1/6 = 1/3 If the events A and B are not mutually exclusive, then

P (A or B) = p (A) + p (B) - p (A and B). The logic behind this formula is that when p (A) and p (B) are added, the occasions on which A and B both occur are counted twice. To adjust for this, p (A and B) is subtracted. What is the probability that a card selected from a deck will be either an ace or a spade? The relevant probabilities are:

P (ace) = 4/52 p (spade) = 13/52

The only way in which an ace and a spade can both be drawn is to draw the ace of spades. There is only one ace of spades, so: p (ace and spade) = 1/52. The probability of an ace or a spade can be computed as: p (ace) +p (spade)-p (ace and spade) = 4/52 + 13/52 - 1/52 = 16/52 = 4/13. Consider the probability of rolling a die twice and getting a 6 on at least one of the rolls. The events are defined in the following way: Event A: 6 on the first roll: p(A) = 1/6 Event B: 6 on the second roll: p(B) = 1/6 p(A and B) = 1/6 x 1/6 p(A or B) = 1/6 + 1/6 - 1/6 x 1/6 = 11/36 The same answer can be computed using the following admittedly convoluted approach: Getting a 6 on either roll is the same thing as not getting a number from 1 to 5 on both rolls. This is equal to: 1 - p (1 to 5 on both rolls). The probability of getting a number from 1 to 5 on the first roll is 5/6. Likewise, the probability of getting a number from 1 to 5 on the second roll is 5/6. Therefore, the probability of getting a number from 1 to 5 on both rolls is: 5/6 x 5/6 = 25/36. This means that the probability of not getting a 1 to 5 on both rolls (getting a 6 on at least one roll) is: 1-25/36 = 11/36. Despite the convoluted nature of this method, it has the advantage of being easy to generalize to three or more events. For example, the probability of rolling a die three times and getting a six on at least one of the three rolls is: 1 - 5/6 x 5/6 x 5/6 = .421

Page 59: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

58

Mutually exclusive: Two events are mutually exclusive if it is not possible for both of them to occur. For example, if a die is rolled, the event "getting a 1" and the event "getting a 2" are mutually exclusive since it is not possible for the die to be both a one and a two on the same roll. The occurrence of one event "excludes" the possibility of the other event.

SUMMARY

A probability is the chance that something of interest will happen.

0≤ probability ≤ 1

Sum of all probabilities is one

A conditional probability is the probability of an event given that another event has occurred

If A and B are independent events, then the probability that events A and B both occur is: p (A and B) = p (A) x p (B).

Two events are mutually exclusive if it is not possible for both of them to occur

If the events A and B are not mutually exclusive, then P (A or B) = p (A) + p (B) - p (A and B).

For added understanding, a student must refer to the prescribed text book for Business Statistics [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 101 to page 129)]

Page 60: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

59

Students must do the exercise from the prescribed text book [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 119)].

For further theory and worked examples based on the

probability go into the following website:

http://onlinestatbook.com/2/probability/basic.html

Page 61: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

60

TOPIC 5

5. DISCRETE PROBABILITY DISTRIBUTION

___

Learning Outcomes: Understanding of discrete probability distributions. Leaners will further understand binomial and Poisson

distributions and their application. Calculate probabilities of these discrete probability

distributions Equipped with counting principles (permutation and

combination)

5.1 PERMUTATIONS AND COMBINATIONS

Imagine that your cash point card has just been stolen. What is the probability of the thief guessing your 4 digit PIN in one go?

P (Guess correctly) =

=

There is, of course, only one correct PIN. The number of possible 4 digit PINs is calculated as follows. There are 10 choices for the first digit, another 10 choices for the second digit, and so on. Therefore the number of possible choices is 10 × 10 × 10 × 10 = 10, 000. So the probability of a correct guess is

P(Guess correctly) =

Permutations: A permutation is a particular arrangement.

Example: How many ways can you arrange the letters A, B, and C?

The general formula is:

nPr = n!/(n-r)!

n! is read as n factorial. In general,

n! = n ∙ (n-1) ∙ (n-2) ∙ … ∙ (2) ∙ (1) = n ∙ (n-1)!

Page 62: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

61

[Note: 0! ≡ 1] So,

10! = 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 3,628,800

6! = 6 x 5 x 4 x 3 x 2 x 1 = 720 Example: How many ways can you assign 5 workers to 5 different tasks?

• Ans: 5P5 = 5! = 120 Example: How many ways can you arrange 10 books if you have space for 5 books in your bookcase?

• Ans: 10P5 = 10! / 5! = 10 x 9 x 8 x 7 x 6 = 30,240

Returning to the example in which the thief is trying to guess your 4-digit PIN, if the thief knows that the PIN contains no repreated digits then the number of possible PINS is

10P4 = 5040 so, assuming that each is equally likely to be guessed, the probability of a correct guess is P(Guess correctly) = 1/ 5040 = 0.0001984.

Combination With permutations, the arrangement of the items is important. Each unique sequence is another permutation. Thus, ABC was different from BCA and both were different from CBA. With combinations, however, ABC, BCA, and CBA are all the same. They are all part of the same combination.

Example: How many different groups of 3 can be selected from 7 people? Say these people are named A, B, C, D, E, F, G, H. Note that once you select, say, B, D, and E, the six different arrangements you can make from them are irrelevant. The formula for combinations is:

nCr = n!/[r!(n-r)!]

Ans: 7C4 = 35

What is the probability that you will win the lotto? Example: How many ways can one come up with 2 heads in 3 tosses of a coin? Ans: 3C2 = 3 verify!!!

Exercises:

1) A market survey has identified 10 desirable features for a new product. However, due to cost constraints, only four of these features can be included. If the features are selected randomly, what is the probability that your four favourites are chosen in your preferred ordering?

Page 63: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

62

2) If you dial 7 digits at random on a (non-mobile) telephone in Westville, what is the probability you dial Dr. Farrow’s office number (which has 7 digits)?

5.2. BINOMIAL PROBABILITY DISTRIBUTION In many surveys and experiments we collect data in the form of counts. For example, the number of people in the survey who bought a CD in the past month, the number of people who said they would vote ANC at the next election, the number of defective items in a sample taken from a production line, and so on. All these variables have common features: 1. Each person/item has only two possible (exclusive) responses (Yes/No, Defective/Not defective etc.)

– this is referred to as a trial which results in a success or failure

2. The survey/experiment takes the form of a random sample

– the responses are independent. Further suppose that the true probability of a success in the population is p (in which case the probability of a failure is 1 − p). • We are interested in the random variable X, the total number of successes out of n trials. • This random variable has a probability distribution in which the probability that X = r,

that is we get r successes in our n trials, is

, where r = 0, 1, . . . , n.

• This distribution is known as the binomial distribution with index n and probability p. We

write this as

X Bin(n, p).

• The mean (expected value) of a binomial distribution is: μ = np

• The variance of a binomial distribution is: σ2 = np(1–p)

Example: A salesperson has a 50% chance of making a sale on a customer visit and she arranges 6 visits in a day. What are the probabilities of her making 0, 1, 2,3,4,5 and 6 sales? Solution: Let X denote the number of sales. Assuming the visits result in sales independently,

X Bin (6, 0.5) and

No. of sales Probability Cumulative Probability

r P(X = r) P(X ≤ r)

0 0.015625 0.015625

1 0.093750 0.109375

2 0.234375 0.343750

Page 64: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

63

3 0.312500 0.656250

4 0.234375 0.890625

5 0.093750 0.984375

6 0.015625 1.000000

Sum = 1.000000 From these, it is straightforward to calculate cumulative probabilities such as the probability of making no more than 2 sales:

• P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2)

= 0.015625 + 0.09375 + 0.234375 = 0.34375. • These cumulative probabilities are also useful in calculating probabilities such

as that of making more than 1 sale:

P(X > 1) = 1 − P(X ≤ 1) = 1 − 0.109375 = 0.890625. Example: A machine produces parts that are very difficult to make. It turns out that 1 out of 20 are defective and must be thrown out. What is the probability that a sample of 10 parts will contain 0 defectives?

• Note: If 1 out of 20 parts are defective, this means that p = .05. • Ans:

0.5987

Exercise:

In the mid 1990s, time magazine reported that 30% of the US Congress supported a tax cut as a means of stimulating the economy and increasing tax revenues. Suppose at that time five members of congress were randomly selected for an interview and asked whether they supported the tax cut to stimulate the economy.

a) Find the probability that all of those who interviewed were opposed to a tax cut.

b) One of the five members of the congress was in favour of a tax cut.

c) At least three of the five members were in favour of a tax cut.

d) Less than five members were in favour of a tax cut.

If 2.6% of all radios manufactured by a certain company are defective, find the mean and standard deviation of the number of defective radios in a batch of 50 radios.

Page 65: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

64

For additional reading, a student must refer to the

recommended text book for Business Statistics [Applied

Business Statistics, Methods and Excel-basic applications (3rd

edition) by: Trevor Wegner (page 126 to page 129)]

For further workings and understanding based on the binomial distribution access the following website.

http://onlinestatbook.com/2/probability/binomial.html

5.3 THE POISSON DISTRIBUTION • The Poisson distribution is a very important discrete probability distribution which arises

in many different contexts. • We can think of a Poisson distribution as what becomes of a binomial distribution if we

keep the mean fixed but let n become very large and p become very small, i.e. a large number of trials with a small probability of success in each.

• In general, it is used to model data which are counts of (random) events in a certain area or time interval, without a known fixed upper limit.

• If X is a random variable with a Poisson distribution with parameter λ (Greek lower case lambda) then the probability that X = r is

P(X = r) = , r = 0. 1, 2, 3 ------

We write X Po(λ). The parameter λ has a very simple interpretation as the rate at which

events occur. The distribution has mean and variance • E(X) = λ, Var(X) = λ.

Example: The management of supermarket receives, on average, two requests per month to order items that the supermarket does not usually hold in stock. Find the probability that the supermarket receive:

a) No such requests in a month b) From two to four such requests in a month c) At least one request in two months

Solution:

a) P(X = 0) =

b) P(2≤X≤4) = P(X=2) + P(X=3) + P(X=4)

Page 66: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

65

For additional reading, a student must refer to the recommended text book for Business Statistics [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 129 to page 132]

Exercise:

Assume that the number of patients that arrives at a physiotherapy practice per hour follows a Poisson distribution. By reviewing past records, it was determined that the mean is two patients per hour.

What is the probability that in a given hour exactly three patients will arrive?

What is the probability that in a given two-hour period exactly eight patients will arrive?

What is the probability that in a given half-hour period exactly two patients will arrive?

REVIEW QUESTIONS

1. What is the probability of rolling a pair of dice and obtaining a total score of 10 or more? 2. A box contains three black pieces of cloth, two striped pieces, and four dotted pieces. A piece is selected randomly and then placed back in the box. A second piece is selected randomly. What is the probability that: (a) Both pieces are dotted? (b) The first piece is black and the second piece is dotted? (c) One piece is black and one piece is striped? 3. A card is drawn at random from a deck. What is the probability that it is an ace or a king? 4. A card is drawn at random from a deck. What is the probability it is either a red card, an ace, or both? 5. Two cards are drawn from a deck (without replacement). What is the probability they are both diamonds?

Students must do the exercise from the prescribed text book [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 146)].

Page 67: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

66

TOPIC 6

6. CONTINUOUS PROBABILITY DISTRIBUTION

Learning Outcomes: Understanding of a Normal distribution and its

importance. Knowledge on Converting to percentiles and back. Application of Normal distribution

6.1 WHAT IS A NORMAL DISTRIBUTION

Normal distributions are a family of distributions that have the same general shape. They are symmetric with scores more concentrated in the middle than in the tails. Normal distributions are sometimes described as bell shaped. Examples of normal distributions are shown to the right. Notice that they differ in how spread out they are. The area under each curve is the same. The height of a normal distribution can be specified mathematically in terms of two parameters: the mean (μ) and the standard deviation (σ).

Mathematical formula for height of a normal curve

The height (ordinate) of a normal curve is defined as: where μ is the mean and σ is the standard deviation, π is the constant 3.14159, and e is the base of natural logarithms and is equal to 2.718282. x can take on any value from -infinity to +infinity.

Page 68: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

67

f(x) is very close to 0 if x is more than three standard deviations from the mean (less than -3 or greater than +3).

6.2 THE STANDARD NORMAL DISTRIBUTION

The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. Normal distributions can be transformed to standard normal distributions by the formula:

where X is a score from the original normal distribution, μ is the mean of the original normal distribution, and σ is the standard deviation of original normal distribution. The standard normal distribution is sometimes called the z distribution. A z score always reflects the number of standard deviations above or below the mean a particular score is. For instance, if a person scored a 70 on a test with a mean of 50 and a standard deviation of 10, then they scored 2 standard deviations above the mean. Converting the test scores to z scores, an X of 70 would be:

So, a z score of 2 means the original score was 2 standard deviations above the mean. Note that the z distribution will only be a normal distribution if the original distribution (X) is normal.

Applying the formula will always produce a transformed distribution with a mean of zero and a standard deviation of one. However, the shape of the distribution will not be affected by the transformation. If X is not normal then the transformed distribution will not be normal either. One important use of the standard normal distribution is for converting between scores from a normal distribution and percentile ranks. Areas under portions of the standard normal distribution are shown to the right. About .68 (.34 + .34) of the distribution is between -1 and 1 while about .96 of the distribution is between -2 and 2. Why is it important?

One reason the normal distribution is important is that many psychological and educational variables are distributed approximately normally. Measures of reading ability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are

usually quite close. A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests

Page 69: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

68

can be derived for normal distributions. Almost all statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality. Finally, if the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth from raw scores to percentiles.

6.3 CONVERTING TO PERCENTILES AND BACK

If the mean and standard deviation of a normal distribution are known, it is relatively easy to figure out the percentile rank of a person obtaining a specific score. To be more concrete, assume a test in Introductory Psychology is normally distributed with a mean of 80 and a standard deviation of 5. What is the percentile rank of a person who received a score of 70 on the test? Mathematical statisticians have developed ways of determining the proportion of a distribution that is below a given number of standard deviations from the mean. They have shown that only 2.3% of the population will be less than or equal to a score two standard deviations below the mean. In terms of the Introductory Psychology test example, this means that a person scoring 70 would be in the 2.3rd percentile.

This graph shows the distribution of scores on the test. The shaded area is 2.3% of the total area. The proportion of the area below 70 is equal to the proportion of the scores below 70. What about a person scoring 75 on the test? The proportion of the area below 75 is the same as the proportion of scores below 75.

A score of 75 is one standard deviation below the mean because the mean is 80 and the standard deviation is 5. Mathematical statisticians have determined that 15.9% of the scores in a normal distribution are lower than a score one standard deviation below the mean. Therefore, the proportion of the scores below 75 is 0.159 and a person scoring 75 would have a percentile rank score of 15.9. The table on this page gives the proportion of the scores below various values of z.

Page 70: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

69

z is computed with the formula: where z is the number of standard deviations (σ) above the mean (μ) X is. When z is negative it means that X is below the mean. Thus, a z of -2 means that X is -2 standard deviations above the mean, which is the same thing as being +2 standard deviations below the mean. To take another example, what is the percentile rank of a person receiving a score of 90 on the test?

The graph shows that most people scored below 90. Since 90 is 2 standard deviations above the mean [z = (90 - 80)/5 = 2] it can be determined from the table that a z score of 2 is equivalent to the 97.7th percentile: The proportion of people scoring below 90 is thus .977.

What score on the Introductory Psychology test would it have taken to be in the 75th percentile?

(Remember the test has a mean of 80 and a standard deviation of 5.) The answer is computed by reversing the steps in the previous problems. First, determine how many standard deviations above the mean one would have to be to be in the 75th percentile. This can be found by using a z table and finding the z associated with .75. The value of z is .674. Thus, one must be .674 standard deviations above the mean to be in the 75th percentile. Since the standard deviation is 5, one must be (5)(.674) = 3.37 points above the mean. Since the mean is 80, a score of 80 + 3.37 = 83.37 is necessary.

Page 71: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

70

Rounding off, a score of 83 is needed to be in the 75th percentile. Since , a little algebra demonstrates that X = μ+ z σ. For the present example, X = 80 + (.674)(5) = 83.37 as just shown. 6.4 AREA UNDER PORTIONS OF THE CURVE

If a test is normally distributed with a mean of 60 and a standard deviation of 10, what proportion of the scores is above 85? This problem is very similar to figuring out the percentile rank of a person scoring 85. The first step is to figure out the proportion of scores less than or equal to 85. This is done by figuring out how many standard deviations above the mean 85 is. Since 85 is 85-60 = 25 points above the mean and since the standard deviation is 10, a score of 85 is 25/10 = 2.5 standard deviations above the mean. Or, in terms of the formula,

A z table can be used to calculate that .9938 of the scores are less than or equal to a score 2.5 standard deviations above the mean. It follows that only 1-.9938 = .0062 of the scores are above a score 2.5 standard deviations above the mean. Therefore, only .0062 of the scores are above 85. Suppose you wanted to know the proportion of students receiving scores between 70 and 80. The approach is to figure out the proportion of students scoring below 80 and the proportion below 70. The difference between the two proportions is the proportion scoring between 70 and 80. First, the calculation of the proportion below 80. Since 80 is 20 points above the mean and the standard deviation is 10, 80 is 2 standard deviations above the mean.

A z table can be used to determine that .9772 of the scores are below a score 2 standard deviations above the mean.

Page 72: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

71

To calculate the proportion below 70,

A z table can be used to determine that the proportion of scores less than 1 standard deviation above the mean is .8413. So, if .1587 of the scores are above 70 and .0228 are above 80, then .1587 -.0228 = .1359 are between 70 and 80. Assume a test is normally distributed with a mean of 100 and a standard deviation of 15. What proportion of the scores would be between 85 and 105? The solution to this problem is similar to the solution to the last one. The first step is to calculate the proportion of scores below 85. Next, calculate the proportion of scores below 105. Finally, subtract the first result from the second to find the proportion scoring between 85 and 105. Begin by calculating the proportion below 85. 85 is one standard deviation below the mean:

Using a z table with the value of -1 for z, the area below -1 (or 85 in terms of the raw scores) is .1587.

Doing the same thing for 105,

A z table shows that the proportion scoring below .333 (105 in raw scores) is .6304. The difference is .6304 - .1587 = .4714. So .4714 of the scores are between 85 and 105.

Page 73: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

72

SUMMARY

Normal distributions are symmetric with scores more concentrated in the middle than in the tails

It is a bell-shaped continuous distribution

Normal distribution has two parameters, namely, the mean and standard deviation

The height of a normal distribution can be specified mathematically in terms of two parameters: the mean (μ) and the standard deviation (σ).

The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1

Normal distribution can be standardized by the formula:

REVIEW QUESTIONS

1. If scores are normally distributed with a mean of 30 and a standard deviation of 5, what percent of the scores is:

(a) greater than 30? (b) greater than 37? (c) between 28 and 34?

2. (a) What are the mean and standard deviation of the standard normal distribution?

(b) What would be the mean and standard deviation of a distribution created by multiplying the standard normal distribution by 10 and then adding 50?

3. The normal distribution is defined by two parameters. What are they? 4. (a) What proportion of a normal distribution is within one standard deviation of the

mean?

Page 74: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

73

(b) What proportion is more than 1.8 standard deviations from the mean? (c) What proportion is between 1 and 1.5 standard deviations above the mean?

5. A test is normally distributed with a mean of 40 and a standard deviation of 7.

(a) What score would be needed to be in the 85th percentile? (b) What score would be needed to be in the 22nd percentile?

6. Assume a normal distribution with a mean of 90 and a standard deviation of 7. What

limits would include the middle 65% of the cases. 7. For this problem, use the scores in the identical blocks test (second in the dataset

"High_School_Sample"). Compute the mean and standard deviation. Then, compute what the 25th and 75th percentile would be if the distribution were normal. Compare the estimates to the actual 25th and 75th percentiles.

For additional reading, a student must refer to the recommended text book for Business Statistics [Applied Business Statistics, Methods and Excel-basic applications (3rd edition) by: Trevor Wegner (page 133 to page 145)]

Page 75: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

74

TOPIC 7

ADDENDUM 621 (A): REVIEW QUESTIONS ___________________________________________________________________________ QUESTION ONE (30 MARKS) 1.1 A batch of 5000 electric lamps has a mean life of 1000 hours and a standard deviation of 75 hours. Assume a Normal Distribution. REQUIRED: 1.1.1 How many lamps will fail before 900 hours? (5) 1.1.2 How many lamps will fail between 950 and 1000 hours? (5) 1.1.3 Given the same mean life, what would the standard deviation have to be to ensure that not more than 20% of lamps fail before 916 hours? (5) 1.2 Components are placed into bins containing 100. After inspection of a large number of

bins the average number of defective parts was found to be 10 with a standard deviation of 3. Assuming that the same production conditions continue, except that bins containing 300 were used

1.2.1 What would be the average number of defective components per larger bin? (5) 1.2.2 What would be the standard deviation of the number of defectives per larger bin? (5) 1.2.3 How many components must each bin hold so that the standard deviation of the

number of defective components is equal to 1% of the total number of components in the bin? (5)

QUESTION TWO (35 MARKS) 2.1 Researchers at the European Centre for road Safety Testing are trying to find out how the age of cars affects their braking capability. They test a group of ten cars of differing ages and find out the minimum stopping distances that the cars can achieve. The results are set out in the table below:

Car ages and stopping distances Car Age (months)

Minimum Stopping at 40 kph(metres)

A 9 28.4

B 15 29.3

C 24 37.6

D 30 36.2

E 38 36.5

F 46 35.3

Page 76: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

75

G 53 36.2

H 60 44.1

I 64 44.8

J 76 47.2

REQUIRED: 2.1.1 Draw a scatter diagram from the data. (11) 2.1.2.1 Compute Spearman’s correlation coefficient (rho) (12) 2.1.2.2 Compute Pearson’s correlation coefficient(r) (12) QUESTION THREE (35 MARKS)

3.1 In Africa it’s estimated that 30% of all drivers have some kind of medical aid. What is the

probability that in a sample of 10 drivers:

3.1.1 Exactly 4 will have a medical aid. (8)

3.1.2 At least 2 will have a medical aid. (8) 3.1.3 5.1.3 More than 9 will have a medical aid. (9)

3.2 Sun Couriers- a parcel delivery company has found that the delivery time of parcels to

clients in the Durban metropolitan area after airport collection is normally distributed with a mean delivery time equal 45minutes (µ = 45) and a standard deviation of 8 minutes (α=8). For a newly arrived consignment at Durban airport, what is the probability that a randomly selected parcel will take: Between 45 and 51 minutes to deliver to the client (10)

TOTAL 100

Page 77: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

76

TOPIC 8 ___

ADDENDUM 621 (B): TYPICAL EXAMINATION QUESTIONS

___

PC TRAINING & BUSINESS COLLEGE HIGHER EDUCATION AND TRAINING

FACULTY OF BUSINESS, ECONOMICS & MANAGEMENT SCIENCES BUSINESS STATISTICS 621

1ST SEMESTER FINAL EXAMINATION

Duration: 2 Hours Marks: 100 Date: xxxxxxxx Examiners: xxxxxxxxx Moderator: xxxxxxxxx

This paper consists of 4 Questions of 10 pages including this page.

PLEASE NOTE THE FOLLOWING:

1) Ensure that you are writing the Correct Examination paper, and that there are no missing pages.

2) You are obliged to enter your learner number and centre name on all answer sheets. The answer sheets provided are the property of the Business College and all extra sheets must be handed to your invigilator before you leave the examination room. Number your answer sheets and ensure that they are stapled in the correct sequence.

3) If you are found copying or if there are any documents / study material in your possession, or writing on parts of your body, tissue, pencil case, desk etc, your answer book will be taken away from you and endorsed accordingly. Appropriate disciplinary measures will be taken against you for violating the code of conduct of the Business College Examinations Board. Therefore if any of these materials are on your person you are requested to hand these over to your invigilator before the official commencement of this paper.

4) This question paper has four sections answer ALL questions:

SUGGESTED TIME REQUIRED TO ANSWER THIS QUESTION PAPER

NUMBERS QUESTIONS MARKS TIME IN MINUTES

1 Question One 25 30

2 Question Two 25 30

Page 78: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

77

3 Question Three 25 30

4 Question Five 15 18

5 Question Six 10 12

Question One (25 Marks) The production foreman of the Oros Fruit Company estimates that the mean sales of oranges is 4 700 and the standard deviation 500 oranges. In view of this scenario, calculate the probability that sales will be: 1.1 More than 5 500 oranges (4) 1.2 More than 4 500 oranges (5) 1.3 Less than 4 900 (4) 1.4 Between 4500 and 4 900 oranges (6) 1.5 Between 4900 and 5 500 oranges (6) Question Two (25 Marks) Shoprite (Ltd) Pty Sells a variety of products to different consumers in South Africa, an interior statistician wants to model the relationship between sales(Y) and variety of products(X) in a store in Pretoria. Consider the following readings (in R1000) from the books of Shoprite:

Time (Months) Sales(Y) Product(X)

Jan 38 68

Feb 52 84

Mar 42 74

Apr 35 57

May 70 90

Jun 62 80

Jul 46 66

Aug 44 64

You are required to calculate the following: 2.1 The Spearman’s correlation coefficient (10) 2.2 The Pearson’s correlation coefficient (10) 2.3 Determine r2 (2) 2.4 Determine R2 (3) Question Three (25 Marks) The following ages of the first 19 supporters at a stadium entrance where recorded during the 2010 FIFA world cup in South Africa: 17, 20, 18, 19, 18, 9, 6, 12, 10, 13, 15, 22, 21, 24, 16, 17, 16, 16, 14

You are required to find/calculate the following, 3.1 40th percentile (5) 3.2 60th percentile (5) 3.3 Trimean (10)

Page 79: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

78

3.4 Geometric mean (5)

Formulae

Page 80: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

79

Page 81: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

80

Page 82: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

81

Page 83: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

82

Page 84: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

83

Page 85: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

84

PC TRAINING & BUSINESS COLLEGE

HIGHER EDUCATION AND TRAINING

FACULTY OF BUSINESS, ECONOMICS & MANAGEMENT SCIENCES

BUSINESS STATISTICS 621

1ST SEMESTER FINAL EXAMINATION

Duration: 2 Hours Marks: 100 Date: xxxxxxxx

Examiners: xxxxxxxxx Moderator: xxxxxxxxx

This paper consists of 4 Questions of 10 pages including this page.

PLEASE NOTE THE FOLLOWING:

1) Ensure that you are writing the Correct Examination paper, and that there are no

missing pages.

2) You are obliged to enter your learner number and centre name on all answer sheets.

The answer sheets provided are the property of the Business College and all extra sheets

must be handed to your invigilator before you leave the examination room. Number

your answer sheets and ensure that they are stapled in the correct sequence.

3) If you are found copying or if there are any documents / study material in your

possession, or writing on parts of your body, tissue, pencil case, desk etc, your answer

book will be taken away from you and endorsed accordingly. Appropriate disciplinary

measures will be taken against you for violating the code of conduct of the Business

College Examinations Board. Therefore if any of these materials are on your person you

are requested to hand these over to your invigilator before the official commencement

of this paper.

4) This question paper has four sections answer ALL questions:

SUGGESTED TIME REQUIRED TO ANSWER THIS QUESTION PAPER

NUMBERS QUESTIONS MARKS TIME IN

MINUTES

1 Question One 25 30

2 Question Two 25 30

3 Question Three 25 30

Page 86: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

85

Question One (25 Marks) The production foreman of the Oros Fruit Company estimates that the mean sales of oranges is 4 700 and the standard deviation 500 oranges.

1.1 calculate the probability that sales will be: 1.1.1 More than 5 500 oranges [4]

√√√ 1.1.2 More than 4 500 oranges [5]

√√√ 1.1.3 Less than 4 900 [4]

√ 1.1.4 between 4500 and 4 900 oranges [6]

√√

√√ 1.1.5 between 4900 and 5 500 oranges [6]

√√

√√ =0.2898 √√ Question Two (25 Marks) The following are Business Statistics CA Test 1 marks from 421 Church Street. 28, 35,38,42,44,46,52,57,62,66,66,68,70,74,84,87 2.1 66 2.2. 10% of 28 = 0.1 *28=2.8 10% of 87 =0.1 *87= 8.7

Page 87: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

86

Mean trimmed 20% =

=

=50.97 =51

2.3 60th percentile is at position

=

=

= 10.2 60th percentile is = 66.2

2.4 Q1 =

=

= 4.25th position Therefore Q1 =42+3(0.25)

=42.75

Q3 =

=

= 12.75th position Therefore Q3 = 68+3(0.75) =70.25

SIQR =

=

=

=13.75 2.5

=4.67

Page 88: P C Training & BUSINESS COLLEGE - myPCTBC.co.za in Business... · 2016. 2. 15. · 6.1 What Is A Normal Distribution 67 6.2 The Standard Normal Distribution 68 6.3 Converting To Percentiles

87

Question three (25 Marks) A car dealer at BMW has recorded the unit prices and quantities sold of three models of particular make of a car from 1987 to 1990. The quantities sold and unit selling prices for 1987 and 1990 respectively are given in the following table:

1987 1990

Car Model Price (1000) Quantity (units sold)

Price (1000) Quantity (units sold)

A 40 10 50 15

B 25 50 30 60

C 10 25 18 20

3.1 Find the price relative for Model B. (3)

= 120

3.2 Compute an appropriate composite price index for 1990 using 1987 as base: 3.2.1 Use a weighted aggregates method based on the Laspyeres approach. (5)

a) Lp =

3.2.2 Use a weighted average of price relative method assuming quantities are held contact at 1990 levels. (4)

b) Wave index=

3.2.3 Find the quantity relative for Model C. (3)

c.

= 80

3.3 Discuss the importance of business statistics in the business world (10) Statistics is considered an important field of study and its importance in management can never be ignored. Statistics models are used in making significant decisions of the company. All forms of data available to the company can be organized by using the statistical tools and statistical formulas can be used to analyze and interpret the data. Once the conclusions are drawn by using the statistical techniques, the management can make important decisions. Decisions related to new product launch, business extension, product diversification, and investment decisions are usually made based on statistical methods. Some of the common statistical methods include probability distribution, regression analysis, and graphical analysis. √√√√√√√√√