SS EN 20 Data Types - Amazon Web Services...SchoolofSixSigma& Data&Types& Overview&...
Transcript of SS EN 20 Data Types - Amazon Web Services...SchoolofSixSigma& Data&Types& Overview&...
School of Six Sigma Data Types
Overview In this module we’re going to discuss data types. By the end of this module you’ll know the difference between Sample Statistics and Popular Parameters. You’ll also know what the different types of data are as well as the best kind to use whenever possible.
Differences Between Sample and Population Let’s get started by learning the difference between a Sample and a Population. To do this we’re going to use an example. Let’s assume a bank wants to gauge their customers’ interest in some new features and has developed a short online
survey. Let’s also assume this is a large bank with more than 50,000 paying customers.
Obviously, reaching all 50,000 customers would prove to be both difficult and expensive. They may only be able to reach 10,000 customers. These 10,000
customers would be known a sample of the overall population of 50,000 customers. When we speak of a Population we’re referring to a collection of ALL
subjects or objects of interest, with the key word being ALL subjects or objects. We’ll rarely have access to an entire population.
Conversely, a Sample is a subset of the population used to make inferences about the characteristics of the population. So, instead of contacting all 50,000 customers the bank would send the survey to a subset, or sample, of the overall population. When we’re dealing with Samples we’re actually working with Sample STATISTICS and when we’re dealing with a Population we’re actually working with Population PARAMETERS.
When we’re speaking about the mean the Sample Statistic is called X bar while the Population Parameter is called mu. When we’re speaking about the Standard Deviation the Sample Statistic is a lower case s while the Population Parameter is Sigma. You’ll notice the Population Parameters are Greek letters while Sample Statistics are Roman letters.
Two Main Types of Data Now that we know the difference between Sample Statistics and Popular Parameters let’s turn our attention to the two main types of data we’ll work with as continuous improvement practitioners.
The first type is attributes data. When we speak of attributes data there are actually two variations. The first type is called binary data. With binary data we’re dealing with two levels. For example, we either pass or fail. The light is either on or off. The product is either good or bad.
The second form of attributes data is count data. With this type of data we’re able count things as the name implies. For example, if someone fails a test we can count how many answers they missed. If the product is bad we can count the number of defects and so on. If it’s available, we’ll always want to use
count data versus binary data since we can learn so much more about the situation.
For example, instead of saying a product is bad, it would be much more useful if we could count the number of defects on the product. Or instead of simply telling
the student he failed the exam, it would be useful if we could tell him exactly how many questions he missed.
The second type of data is called variables data, sometimes referred to as continuous data. Variables data comes from a measurement scale that
can be divided into finer and finer increments. Things like weight, distance, dimensions, and speed are all examples of variables data.
What Type of Data is Best? The question is: which type of data is best? If both are available, what type of data should we seek to collect and analyze?
The answer is: if it’s available we always want to collect and analyze variables data.
There are some statistical reasons for this related to something called power and sample size which we’ll learn about later in the course, but the gist of it comes down to the fact that variables data is more powerful, statistically speaking, than attributes data.
We may only need 30 data points of variables data to characterize a process while we may need 100 data points of attributes data to learn anything at all. When possible always seek to collect and analyze variables data since we can learn so much more.