aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic...

34
Describing a Variable Statistics Essentials Copyright © SS&C Technologies, Inc. All rights reserved. Zoologic™ Learning Solutions

Transcript of aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic...

Page 1: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Describing a Variable

Statistics Essentials

Copyright © SS&C Technologies, Inc. All rights reserved.

Zoologic™ Learning Solutions

Page 2: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Course: Statistics Essentials

Lesson 1: Describing a Variable

Course Introduction

Statistics Essentials is based upon the content and close collaboration of Professor Peter Klibanoff, Associate Professor of Managerial Economics and Decision Sciences at the Kellogg School of Management at Northwestern University. A member of the Kellogg faculty since 1994, Professor Klibanoff holds a Ph.D. in Economics from the Massachusetts Institute of Technology and an A.B. in Applied Mathematics from Harvard University. At Kellogg, Professor Klibanoff received the Chairs' Core Teaching Award in 1998. Statistics Essentials combines Professor Klibanoff's content expertise and teaching skills with an effective learn-by-doing instructional design to provide a highly interactive learning experience. This course guides you through fundamental principles and techniques of statistics and how they are used to make more informed business decisions.

Introduction to Describing a Variable

This course will teach you how to effectively employ statistics to make informed business decisions. We begin by studying probability and probability distributions as a way to help describe and quantify the uncertainties that surround decision-making. To assist us in this process, we will consider the challenges of Café Caliente, a global chain of coffee houses. We will use probability distributions to help the managers at Café Caliente describe important quantities such as average sales that fluctuate from week-to-week and location-to-location. The concept of probability will serve as our foundation as we explore statistics further.

Throughout this course, you will need a calculator to complete several of the exercises. If you have Windows software, one is available by accessing "Programs, Accessories, Calculator". Or you may use your own hand-held version.

Page 3: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Café Caliente International sells gourmet coffee and has been so successful that, with recent rapid growth, it now operates 2,700 stores in the US and another 525 abroad. And that success has brought with it some serious problems; profit margins are eroding, product and service quality are harder to maintain, and the expanding business volume is stretching the operations infrastructure. But that's not all — the competition sees these problems and, sensing its rival's vulnerability, is responding aggressively. Café Caliente needs strong solutions based on accurate information about its environment.

When are the fundamental concepts of statistics most useful?

Statistical concepts are most useful when: 1) some key information is unknown and 2) data relevant to this information can be obtained. For example, in deciding whether or not to issue a credit card to a consumer, it is important to have a good idea of how likely any given consumer is to default on the card. While this information is not known, it is possible, by looking at the history of other credit card holders, to estimate the probability of default as a function of characteristics that are known, such as age, income, past credit history, etc. You will encounter many other examples of such situations throughout this course and throughout your business life.

Page 4: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Each of these managers wants to achieve a specific objective, but each objective is different from the others. Four different problems requiring four different solutions. And the effectiveness of each solution will pivot on the proper use of statistics. But which statistical concepts or tools for which problem?

We'll see. But first, we need to review some fundamental concepts of statistics. Let's do that within the context of Café Caliente. From there, we'll use that review as the foundation for learning how the managers can apply statistics to solve their problems.

How do I decide what to observe or analyze?

It all depends on what questions you want to ask and answer, or what business problems you want to investigate. As long as you can frame the question and obtain some data that speaks to the question, you can potentially apply statistics.

What do these problems have in common with each other?

These problems each involve uncertainty, randomness, and processes that occur hundreds of times a day across Café Caliente's many locations. In each case, managers may have some intuition that allows them to make educated guesses in deciding which actions to take. This intuition needs to be validated, and the managers' decision making assisted, by analytic tools that support conclusions through the accumulation and evaluation of evidence. Statistics provides the scientific body of knowledge, and the tools that can help managers develop solutions to business problems.

Page 5: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Each of these problems also represents an opportunity for the managers to significantly improve Café Caliente's operations, either through greater efficiencies (cost savings), improved sales (increased revenues) or by reducing potential risk (limiting the firm's liability and the cost to insure against it).

Knowing how to use and interpret statistical results enhances value for a manager and can give the company a leg up on the competition.

Imagine if the managers had perfect knowledge of the traffic flow and order profiles for every time interval of every day at every store across the globe. In that ideal world, accurately predicting how much coffee to buy, how many people to hire, etc., would be easy. But in reality, these random variables fluctuate all the time.

Having said that, these fluctuations usually repeat themselves in identifiable patterns. And, by using statistics, the managers can analyze these patterns and find some predictive reliability. Their starting point is to think of these fluctuations as being determined according to a probability distribution. This is a curve that defines the possible outcomes for a random variable and depicts the likelihood that any given outcome or interval of outcomes will occur. Distributions can be either discrete or continuous, and the mathematics involved differs accordingly.

probability distribution A plot of potential events and the chance each has of occurring.

Variable A characteristic of a population unit.

Page 6: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Wouldn't it be easier and more valid to just observe a customer's behavior or talk to a few customers instead of using statistics?

It might be easier, but it is unlikely to be more valid. While observing a few customers or interviewing some in depth may certainly help give ideas about where to focus the analysis, it is unlikely to give an accurate picture of the population of customers as a whole. This is what statistics and random sampling

are designed to do. Statistics - and more casual methods such as the ones mentioned above - should be thought of as complements, not substitutes.

A discrete probability distribution depicts the likelihood of a finite number of possible outcomes. Imagine the customer service manager — he sees a discrete probability distribution. The number of cups of coffee (or other merchandise) purchased per customer is finite; a customer can buy only a specific number of items. And those items purchased are distinct, separate, and countable.

In a discrete distribution, each specific outcome has a specific probability of occurrence. That's why the probabilities of all possible outcomes add up to 1 (i.e., there is a 100% chance that one of the outcomes will occur).

What happens as the finite set gets bigger or becomes infinite?

In cases like this, a continuous distribution is often used to approximate the discrete distribution. (We will describe continuous distributions on the next screen.) Additionally, a continuous distribution is also sometimes used in cases where a distribution is technically discrete but where it is more convenient to use a continuous distribution. For example, if sales are measured in dollars and cents, this is technically

Page 7: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

a discrete random variable. Nevertheless, it is much more convenient in many cases and involves no real loss of precision to proceed as if money were perfectly divisible.

A continuous probability distribution is different, as we can see from the perspective of the global purchasing manager. To forecast her purchases of coffee beans for Café Caliente, she needs to estimate how much coffee the stores will likely serve. But since coffee can come in any quantity, there is an infinite number of possible outcomes of that random variable. This problem would be unmanageable if not for a continuous distribution. The purchasing manager can use a continuous distribution to focus on relevant intervals of those outcomes.

The curve that represents a continuous probability distribution is called a probability density. The area beneath the probability density represents the probability associated with the different outcomes. (See the shaded areas on the image above.)

In a continuous probability distribution, what is the probability attached to a single value?

A continuous random variable has an infinite number of possible outcomes. For example, the measurement of monthly rainfall is limited only by the precision of our instruments and by the practical significance of decimal places to human decision-making and interest. There are an infinite number of

Page 8: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

possible rainfall amounts, as can be easily seen by considering that 20 inches of rainfall could more precisely be measured as 20.1 inches, which could be more precisely measured as 20.11 inches, and so on. Since infinite numbers of values are possible within a continuous distribution, it stands to reason that the probability of any single value is zero.

Probability within a continuous distribution is assigned to a range of values. For example, when we say 21.5 inches of rain fell in April, we often mean something like "between 21.499 and 21.501 inches of rain fell in April (rounding to three decimal places)." You would assign a positive probability value to this range of outcomes of the random variable.

One implication of this fact is that the probability of having more than 21.5 inches of rain in April is always the same as the probability of having 21.5 inches of rain or more fall in April.

Is either type of distribution inherently more useful than the other?

Both discrete and continuous distributions are useful. The analysis of lines or queues (e.g., lines of customers waiting to be served, lines of parts waiting to be used in an assembly process) often makes use of discrete distributions. Much of the statistics developed in this course uses continuous distributions.

Page 9: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 10: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

For Café Caliente's managers, seeing the outcomes of the random variable as a probability distribution is essential. In addition, the managers will want to calculate three summary statistics for the distribution - the mean, variance, and standard deviation. These summary statistics provide an efficient way to characterize distributions and communicate the essence of those characterizations to others.

Page 11: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Let's first look at the mean of the distribution. Think of the mean as the average. The mean itself might not be one of the actual possible outcomes but it will be a good measure of the central tendency of the distribution. Let's revisit our distribution of number of items purchased per visit. As shown above, we calculate the mean for a discrete population by multiplying each possible occurrence times the associated probability, and then totaling the results.

Apart from the mean, what other two statistics represent the central tendency of a distribution?

Two other statistics that represent the central tendency of a distribution are the median and the mode. The median represents the value for which there is exactly the same probability of the random variable being above the median as there is of it being below the median. It is the 50th percentile. The mode is the value that occurs most frequently.

How is calculating the mean of a probability distribution similar to calculating the mean for any group of numbers?

When you calculate the mean of any fixed collection of numbers, you sum all the values and divide by the number of values in the collection (e.g., the mean of 4,5,6,7 = (4+5+6+7)/4 = 5.5). Notice that this is the same as multiplying each value by 1/n and adding up the results ((4 • 1/4)+(5 • 1/4)+(6 • 1/4)+(7 • 1/4) = 5.5). When you calculate the mean of a probability distribution, you multiply each value by its probability and then sum the resulting products. Conceptually, the probability in the latter calculation is the equivalent of the 1/n factor in the former calculation. In the collection of numbers 4,5,6,7, each number has an equal probability of occurrence, which is ¼ (the 1/n value).

Page 12: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

For this discussion, let's use the above continuous distribution to depict the number of customers served daily at one of Café Caliente's Chicago locations. Note that the mean falls at the exact middle of this distribution; that is, half the area under the curve falls on either side of the mean. This will occur whenever a distribution is symmetric.

The mean is helpful in understanding the probability distribution, but it's not enough. We usually calculate two more, related statistics to really understand the behavior of the random variable. Let's take a look at the variance, and then, the standard deviation. These two summary statistics will help describe how spread out the distribution is around the mean.

mean A mathematical calculation of the mid-point in a statistical sample. Also known as average. Is it always the case that the mean will be in the exact middle of the distribution?

No. Another measure, called the median, is the number such that half the probability is to the right and half the probability is to the left. In a symmetric distribution the mean and median are the same, but in general they will differ.

Is there any difference between calculating the mean in a discrete probability distribution vs. the mean in a continuous probability distribution?

The only difference is that in the case of a continuous distribution, the number used to multiply each value is the probability density, and the "adding up" is done by integration.

Page 13: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Do you suppose that the distribution of number of customers served differs between Café Caliente's suburban stores and its urban stores? Sure, and the customer service manager no doubt realizes that as well. But he needs a way to quantify this. Let's assume that the above depicts the distributions of customers served at two locations. Let's further assume for the moment that both probability distributions have the same mean.

These probability distributions differ quite a bit. The probabilities in the flatter distribution are more spread out, or dispersed, indicating more variable outcomes than that of the taller distribution. Great dispersion suggests less predictability - of purchasing needs, staffing, etc. Can we quantify the difference in dispersion, so as to better understand these patterns? Yes, by calculating the second summary statistic, the variance.

variance For a set of random numbers, the measure of the average distance between the numbers and the mean. The square root of the variance is the volatility.

What does it mean for a distribution to be symmetric?

A distribution is symmetric when you can draw a vertical line through the middle of the distribution curve and the resulting halves are mirror images of each other.

Page 14: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

The variance is defined as the average squared deviation from the mean. A very large variance indicates that values very far away from the mean may not be uncommon.

The good thing about the variance is that it can be compared with variances of other probability distributions. But since the variance uses squared units as a measure, it is difficult to fully understand it and to talk about it with others. For example, let's say that the variance of the downtown store's distribution is 225, which in variance terms is actually "225 customers squared." How do you interpret this? Does it make practical sense? Not really. This is where the third summary statistic, the standard deviation, will come in handy.

How exactly do you calculate the variance?

The variance of the population ( 2) is calculated by (1) taking the difference between each value in the population (xi) and the mean (µ), (2) squaring the differences, (3) summing all these squared terms, and (4) dividing the sum by the number of values in the population (N).

The formula is:

Why does the variance consider squared deviations from the mean?

Page 15: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

If we took all the deviations from the mean and averaged them without squaring, the positive differences would cancel with the negative ones and we would always sum to zero, which would not be useful. Alternative measures of spread (i.e., dispersion), such as average absolute deviation (where

instead of squaring the differences, we take the absolute value) are occasionally used, but are much less common than variance, partly because they are harder to work with mathematically.

Why is it easier to order inventory and staff for store 1?

Since store 1 has a smaller variance of customers around the mean, its traffic patterns are much more predictable than those of store 2. At store 2, there is a greater possibility that an unexpectedly large number of customers will walk in on any random day. Because the manager of store 2 cannot predict when these peaks in traffic will occur, he must either overstaff and carry greater inventory (both of which introduce significant expense), or risk foregoing revenues due to his inability to satisfy customer demand. The manager of store 1 does not have to be prepared for such contingencies.

The standard deviation is a measure of the dispersion of a probability distribution that is easier to interpret than the variance. The standard deviation is the square root of the variance. Converting the variance back into the original units of measurement (by taking the square root) makes the interpretation of dispersion easier to understand and more meaningful. Now, the dispersion and the mean are in the same units as the variable. Let's suppose that in our example, that the standard deviation at the downtown location is 15 customers. Now, we understand that 220 customers plus or minus 15 customers represents all the outcomes within one standard deviation of the mean.

What if the variance is zero?

Page 16: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

A zero variance tells us that the variable always takes on the mean value, which indicates that the mean tells us everything we need to know about the population. If none of the values in the population varies from the mean, we have a perfectly predictable population — all the values in the population are identical.

Can you provide an example of why the standard deviation is easier to interpret than the variance?

Say you are assessing the average salary of consultants and find that the mean is $90,000 and the variance is "$9,000,000 square dollars." A unit of measure that is defined as "square dollars" is meaningless in trying to understand and communicate the dispersion around the mean of $90,000. If you instead take the square root of the variance, you obtain the standard deviation of $3,000 dollars. Your measure of the dispersion is now in the same units of measurement as the mean, making it easier to understand and communicate.

What is the formula for standard deviation?

Page 17: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 18: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

We've been using a continuous probability distribution to depict patterns of a random variable from Café Caliente's operations. And we're using three summary measures (the mean, the variance and the

standard deviation) to help describe that distribution. One very common and useful probability distribution that is completely described by its mean and standard deviation is the normal distribution.

Why is the normal distribution so useful? Many random variables that occur naturally in the business world (and beyond) or that are constructed in the process of statistical analyses are normally distributed. As a result, the normal distribution is quite common and most professionals are familiar with it. Even casual users of statistics can easily identify the probabilities associated with the outcomes of a random variable that is normally distributed. All the casual user needs to know is the mean and the standard deviation of the distribution.

standard deviation A measure of the extent to which results vary from the mean, represented by sigma. Also see volatility. normal distribution Defined by a mean and a standard deviation, a probability distribution that forms a symmetrical bell-shaped curve around the mean What is the mathematical formula for a normal distribution?

Page 19: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

What are some characteristics of a normal distribution?

Some characteristics of a normal distribution are that (1) the curve is bell-shaped and symmetrical (i.e., the two halves are mirror images of each other), (2) the mean, median and mode are equal and located at the peak, (3) the total area under the curve is equal to one, (4) the curve is asymptotic, which means it approaches but never touches the x-axis as it extends further and further away from the mean, and most important, (5) there is exactly one normal distribution for every possible value of a mean and a standard deviation. Once you know the mean and the standard deviation of a distribution and assume that the distribution is 'normal,' you have completely specified the probability distribution.

What other types of probability distributions are there?

There are many other types of probability distributions in addition to ones discussed in this course. Some of the more commonly used families of distributions include the chi-squared, the F-distributions, the beta distributions, the gamma distributions, the Poisson, the binomial, the geometric and the exponential distributions. Most texts on probability will include a description of such distributions.

Page 20: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 21: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 22: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 23: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 24: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Earlier, we assumed that the probability distributions for the urban and suburban stores all had the same mean, which made it easier for comparison purposes. In reality, that's unlikely to be the case. Is there then any standard way to perform comparisons across stores? Yes, assuming that the random variables we wish to compare across stores are normally distributed, we can make those stores comparable by standardizing their normal distributions. This standardization converts each normal distribution, with whatever mean and standard deviation, into a standard normal distribution with mean 0 and standard deviation of 1. The standard normal distribution is also referred to as the z-distribution. Play the animation above to view this technique.

With a standard normal distribution, the managers can easily calculate probabilities associated with specific outcomes of a random variable. In addition, they can also compare the different probabilities of outcomes across multiple stores in Café Caliente's global system. From there, they can make decisions with quantifiable levels of confidence.

Let's look at some examples.

What notation is used in describing a normal distribution with a certain mean and certain standard deviation?

The usual notation for a normal distribution with mean, µ, and standard deviation, (and thus variance, 2), is N(µ, 2).

Page 25: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Imagine you are the manager of the downtown Chicago location and assume for the moment that the number of customers you serve per day is normally distributed with a mean of 510 and a standard deviation of 25. What's the likelihood that you may have at least 565 customers tomorrow? How do you go about determining the probability?

The answer is to convert this question into the equivalent question on the z-distribution by standardizing. The z-value that corresponds to 565 is 2.2 [(565-510)/25]. It is easy to use a spreadsheet program or statistical tables to find the probability associated with a z-value of 2.2. Doing this reveals that a z-value exceeds 2.2 with a probability of 0.014 (1.4%).

If 565 customers in a day presents a staff scheduling problem or an inventory management problem, relax. The probability that this will happen tomorrow is less than 2%.

Space Shuttle Challenger: Could an Improved Interpretation of O-Rings Data Have Prevented the Disaster?

On the morning of January 28, 1986, NASA launched the space shuttle Challenger. After one minute in flight, the Challenger exploded, killing all seven persons on board, including the first teacher in space. How could a mistake of this magnitude occur, and could the proper use of statistics have prevented it?

The primary cause of the explosion was the failure of the O-rings on the solid rocket boosters. Why did they fail on this particular launch? The ambient temperature at launch time on January 28 at Kennedy Space Center was 26 degrees Fahrenheit. Previously, the lowest temperature of any launch was 53 degrees and even then, the O-rings experienced some erosion. On that fateful day, the engineers at Morton-Thiokol could not confidently predict the shuttle results, since they had not conducted dynamic tests of the boosters at temperature levels below 40 degrees Fahrenheit.

On what basis did the engineers decide to launch? They looked at two simple charts, which provided a chronological display of O-ring erosion from multiple historical launches. Regrettably, this display raised no red flags, as it might have had it instead been designed differently; a display that depicted the probability distribution of O-ring erosion would have revealed a frightening outlook with progressively lower air temperatures.

Had they designed their analysis with more insight, and used basic statistical concepts to inform their views, they may have been inclined not to launch the shuttle, even when heavily pressured by the senior executives at NASA to get the job done.

Where do I find z-distribution values?

Page 26: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

You can find z-distribution values by using your spreadsheet program or statistics software, or in the tables at the back of most statistics textbooks.

Suppose you need to compare the downtown San Diego store with the downtown Chicago store. Can standardization help you? Yes, by standardizing each of the two different stores to the standard normal distribution, you can compare the two stores to each other. Suppose Chicago staffing is the standard to be followed at Café Caliente; that is, you staff at a level to serve 98.6% (.986) of all possible outcomes of daily customer traffic. In the Chicago location we saw this meant staffing for 565 customers a day (staffing for 98.6% of all occurrences is equivalent to saying that 1.4% of outcomes will not be covered by this staffing level).

Now, you are the San Diego location manager and your store's distribution has a mean of 390 and standard deviation of 15. What level are you staffing for? Said differently, what is your staffing level that corresponds to a z-value of 2.2? The answer is 423 [390 + (2.2 • 15)], since 423 is 2.2 standard deviations above the mean of 390.

Page 27: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

The probability of exceeding 565 daily customers in the Chicago location is equivalent to the probability of exceeding 423 daily customers in the San Diego location. Each location should staff to this level if their goal is to service 98.6% of all daily customer traffic levels.

When do I need to use statistical charts?

You need to use statistical charts to determine the values of a probability distribution only when your spreadsheet or other software is not available. Your spreadsheet contains the areas under the curve for all the possible values of the mean, and the standard deviation for normal distributions and t-distributions (and others). Since spreadsheet programs are now so widely used, the tables in statistics textbooks are becoming more and more obsolete.

Page 28: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 29: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

Another very common type of distribution in statistics is called the t-distribution. We use the t-distribution as a model when we need to estimate both the mean and the standard deviation of a distribution.

All t-distributions, like the standard normal distribution (z-distribution), have a mean of zero. But unlike the standard normal distribution, the t-distributions have a standard deviation larger than one. A t-distribution has a parameter called degrees of freedom. The larger the degrees of freedom in the t-distribution, the more closely it resembles the standard normal distribution.

t-Distribution A type of distribution useful in statistics, especially in hypothesis testing, when the true standard deviation of a population is not known. Unlike a normal distribution, which is constant given a mean and standard deviation, the t distribution changes shape slightly as "degrees of freedom" (related to sample size) changes. As degrees of freedom increases, the t distribution approximates the normal distribution (the t distribution is generally considered to be normally distribution when degrees of freedom is equal to or greater than 30). mean A mathematical calculation of the mid-point in a statistical sample. Also known as average. normal distribution Defined by a mean and a standard deviation, a probability distribution that forms a symmetrical bell-shaped curve around the mean.

What does the term degrees of freedom mean?

A good non-statistical way to understand degrees of freedom is to think of the following situation: A manager heads a division with four departments. Part of her job is to take the $10 million budget of the division and allocate that money across the four departments. Given that the total budget of the four

Page 30: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

departments must be $10 million (or, equivalently, an average of $2.5 million per department), how many department budgets must the manager set before her job is done? The answer is three.

Once the budgets of the first three departments are determined, the remaining department budget must automatically be whatever is left over out of the $10 million. In this situation we say that the manager has three degrees of freedom. In statistics, the meaning is the same, except instead of counting the number of choices a manager must make before everything is determined, we will end up counting the number of data values we must observe, along with our estimates, until the whole sample is determined.

When is a t-distribution the same as a standard normal or z-distribution?

A t-distribution becomes more and more like the standard normal distribution or z-distribution as the number of degrees of freedom grows larger and larger. As the number of degrees of freedom becomes infinite, the t-distribution becomes identical to the z-distribution.

Page 31: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course
Page 32: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

So far, we've familiarized ourselves with Café Caliente and, along the way, with the foundational concepts of probability and statistics. We're about to see how the managers can use statistical techniques as tools for effective decision-making.

But before we get into the details, let's clarify the two broad ways in which the managers can use statistics to make decisions with confidence. The first is to help the managers estimate unknown values that are central to a decision-making process. Often, a manager may have some intuition about some aspect of the coffee business; for example it takes too long to serve customers. That simple notion is vague unless it can be quantified. For example, statistics will allow the service manager to understand how long it takes on average to serve a customer and what the dispersion is around that average. This knowledge can be developed across stores globally and will form a solid foundation upon which the manager can begin to make decisions.

Statistics will also allow the managers to assess the accuracy of estimates. This knowledge will allow them to decide how much weight to place on decisions and actions.

What is descriptive statistics?

Descriptive statistics refers to the use of statistics as a means to conveniently describe a set of data. For example, knowing the mean, median, standard deviation, minimum and maximum of a set of numbers helps to quickly get a feel for the numbers without having to examine each one.

Page 33: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

The second way in which the managers can use statistics is, in essence, to add quantitative rigor to their hunches. For example, one manager might have a particular idea about how to improve service or expand sales, and several colleagues may strongly disagree with the proposed action. These managers are relying on their intuition and deductive reasoning. That's where many good ideas begin - with untested assertions. But it's not where they end.

The managers can use statistics to subject their assertions to a quantitatively rigorous process called hypothesis testing. With what degree of confidence do the observed sample data support a particular idea about the coffee business? Should the hypothesis that represents that idea be accepted or rejected, and with what implications?

Page 34: aisics Describing a Variablefiji.zoologic.com/lms-courses/customer_resources/Course...Zoologic Learning Solutions Course: Statistics Essentials Lesson 1: Describing a Variable Course

These questions are part of the statistical science that, as we'll see, underlie the managers' decision making and their effectiveness in solving Café Caliente's urgent business problems. Additionally, this

statistical science provides the tools to help managers communicate with investors, suppliers, customers, each other, and other interested parties. Statistics helps provide a way to support assertions or hunches; it helps us to scientifically prove the points we make to others.

What is inferential statistics?

Inferential statistics refers to the use of statistical analysis of a sample to infer something about the underlying population. It is this type of use of statistics that is emphasized in this course. Ultimately, the manager wants to use statistics to help estimate or predict aspects of the world crucial to his or her decisions.