Fall 2015 BUSA 3110 - Statistics for Business Set 1: Orientation and Context Library Readings...

Fall 2015BUSA 3110 - Statistics for Business

Set 1: Orientation and ContextLibrary Readings

Chapters 24 and 1.1 (Current Role of Statistics)

Kim I. Melton, Ph.D.

2

Module 1

• Day 1• Orientation

• Logistics• Who are we: Roll and Introduction• Resources: Physical and Virtual• Location(s): Physical and Virtual

• Expectations

• Course Description• Content and Organization

• Homework

3

Logistics

• Roll• Brief Syllabus (note HW assignment at the end)

• Homework for the first two weeks posted at:faculty.ung.edu/kmelton/busa3110.html

• During the second week, all of my sections will be combined in D2L under Section E (There will be no change to your official Banner registration.)

• After the second week a master list of all assignments will be posted in D2L.

4

System of Profound Knowledge

About me

5

Materials That You Need to Purchase• Custom packet

(available at Dahlonega and Gainesville Bookstores)

• Includes:• Selected chapters from Business

Statistics, 3rd Edition by Sharpe, De Veaux, and Velleman

• Access to MyStatLab with• Homework management system• Chapter quizzes• Complete text of Business Statistics,

3rd edition• Answers to odd problems• Data for problems in the text• And more

6Setting up MyStatLab.com

Ignore - I will not be linking MyStatLab to D2L this semester.

Ignore (for now): You will be able to use temporary access until Drop/Add ends.

7

D2L

Other Support Materials

Software.ung.edu

UNG Library

All of my sections will be combined into Section E after Drop/Add

Software for Calculations

8

Pros Available for Windows and Macs

(born on the Mac) Statistical software with an

interface that is similar to a spreadsheet

Dynamically linked analysis Graphics first; computations

follow Wide array of statistical analyses

available in a menu driven approach

Cons Not as widely available Less familiar Requires new thought processes

Excel

Pros

Spreadsheet software that is widely available

Familiarity

Quick basic insight

Cons

Analysis ToolPak needed for statistical analysis beyond the most basic is not available on all platforms (e.g., Mac and Office 365)

Analyses are not linked (especially graphical and inferential)

Requires the user to have more statistical knowledge to use effectively and appropriately

Limited in terms of analyses available

Think “and” not “or”

Learning Expectations for Class

• Attendance• Arrive on time• Stay the entire time

• Preparation• Spend time before class reading

(text and homework)• Take notes while in class • Work homework after class

• Professionalism• Take responsibility for learning

• Believe you can learn statistics• Ask questions• Try to answer questions • Seek help EARLY when you are struggling

• Be ethical• Put phones away

9

INVEST

10

What is Statistics (as a field of study)?

Videos• BA and Optimization [2010] (1:15 minutes)• Think (1 minute)• Turning Data into Insight (3:26 minutes)

How (and why) is the field of Statistics changing?

Source: http://www.datasciencecentral.com/profiles/blogs/data-veracity

https://www.youtube.com/watch?v=fFdITHMuy2w

https://www.youtube.com/watch?v=fFdITHMuy2w

https://www.youtube.com/watch?v=QL1dQuK5Wsg

https://www.youtube.com/watch?v=6jDjeNJrN14

11

Data/Information/Knowledge/Wisdom

Doing things right(Efficiency)

Doing the right things(Effectiveness)

DATA

INFORMATION

KNOWLEDGE/UNDERSTANDING

WISDOM

Symbols (raw values) that represent properties of objects/events

Describes; provides answers to who, what, where, and when questions

Explains; provides answers to how to and why questions

Evaluates knowledge/understanding; deals with values; uses judgment; answers what is best and why

Based on the work of Russell Ackoff. See “From Data to Wisdom” in Ackoff’s Best, pp. 170-174, 1999.

The Course

12

Applied

Statistics • Uses data • From situations where variation exists• As input to quantitative models• That provide information• To help develop the knowledge • For making decisions• That inform action

• For use in a practical setting• Where theoretical assumptions may

not apply perfectly• and results and limitations need to

be communicated in the language of the situation

Models (and theories) must be viewed in a context

13

14

Course DescriptionA second course in statistical methods with special orientation to applications in business. Emphasis will be placed on application of statistical techniques, assessing their appropriateness, and communicating results to various audiences. Topics include:• data collection, • sampling, • data visualization, • data analysis, • model building using regression, and

• other statistical techniques. Statistical software is used extensively in the course. This course should be taken as soon as the prerequisite is satisfied. Prerec: MATH 2400 with a grade of C or higher.

15

Learning Outcomes(Course Level)

Upon completion of this course, students should be able to:

• select appropriate statistical methods to guide decision-making

• generate and use statistical output to analyze data• identify the limitations of the statistical methods covered• communicate how statistical studies were conducted and

the results of those studies• recognize ethical issues related to the collection and

analysis of data and the communication of the results of the analysis

16

Day 1 Homework

• Details of your homework assignment (to be completed prior to coming to class on Wednesday) is posted at faculty.ung.edu/kmelton/busa3110.html

• This assignment does include • reading the entire syllabus• accessing and reading an article available through the

UNG library• bringing written material with you to class• obtaining your book and access to MyStatLab

• NOTE: We meet in the lab next class period (Dahlonega: NOC 109 ; Gainesville: MTN 5100)

Day 2

• Syllabus questions• Analytics 3.0 Article

• Analytics 1.0, 2.0, and 3.0• Role that statistics plays in each

• And how the field of statistics has changed (and continues to change)

17

18

Syllabus

• Text, MyStatLab, other resources• Accessing material D2L and MyStatLab• Software: JMP and MS Office• Course Format• Grading• General expectations (especially

deadlines, make-ups, extra credit, academic integrity, phones)

19

Content

• Eight “Modules” (Sets of Slides)1. Orientation and Context

2. Basics of Data Collection

3. Data Visualization and Quantitative Summary

4. Data Collection in Practice

5. Inference Involving One Variable

6. Simple Linear Regression

7. Multiple Regression Basics

8. Model Building

Data Information

Knowledge

Wisdom

20

Format

Each Module will include:• Reading material (from the

text and/or supplemental assignments)

• Daily graded homework (app. 30 minutes in MyStatLab)

• Chapter quizzes (app. 15 minutes in MyStatLab)

Most Modules will include:• A short video related

to the topic• JMP computer

assignments using• In-class or out-of-class

graded assignments developed by the instructor

Monday classes meet in the traditional classroom (NOC 016 or MTN 3214).Wednesday classes meet in the computer lab (NOC 109 or MTN 5100).

21

Grading

• MyStatLab Homework (16 points)

• MyStatLab Quizzes (16 points)

• Instructor Supplied Assignments (66 points)

• Preparation and Participation (10 points)

• Final (0-12 points) 90 and above A

80 – 89 B

70 – 79 C

60 – 69 D

Below 60 F

22

General Expectations

• Learning is not a divided responsibility (I teach, you learn)—learning is a joint responsibility (we learn together)

• My “hot buttons”• Timeliness• Ethical behavior• Professional orientation toward learning• Recognition that “true” learning involves

more than getting the right answer.

23

Analytics

24

• What role does statistics play in the developing field of Analytics?

OR

• How does Analytics imply a need for increased understanding of statistics?

25

Impact of Analytics on the Way we Think about ___

• Evolution vs. revolution• Improvement vs. innovation• 1st order change vs. 2nd order change

• change in how we do something (1st order) vs. change in what we do (2nd order)

• Paradigm shift…makes us go back to the most basic assumptions

Statistic

s?

The Changing Face of Statistics

• Methods• Calculations and Graphs• Calculators and Tables• Mainframe Computers and Specialized Software• Personal Computers and Spreadsheets• Personal Computers and Specialized Software

• Users/Creators • From: Limited individuals with specialized training/education• To: Democratization of Statistics

• Major Areas of Focus• Descriptive (no attempt to go beyond the current data set)• Inferential (a sample to draw conclusions about a population)• Process oriented (uses past data collected over time to make decisions for

the future)• Big Data and Analytics

• Predictive• Prescriptive

26

27

Your List?(Role of Statistics in Analytics)

1. ___ 7. ___

2. ___ 8. ___

3. ___ 9. ___

4. ___ 10. ___

5. ___ 11. ___

6. ___ 12. ___

28

Day 2 Homework

• Posted at: faculty.ung.edu/kmelton/BUSA3110.html

• Includes:• Reading assignment• MyStatLab Homework

• One due Sunday, 8/23/15 11:30pm• One available (due Wed., 8/26/15 7:30am)

29

Week 2(Chapters 24 and 1.1 and Intro to JMP)

• Why start with the LAST chapter in the book?

• This is a second course in statistics. This chapter lets you reflect on the tools/techniques from the first course…and sets the stage for this course.

CONTEXT

Copyright © 2015 Pearson Education. All rights reserved. 24-30

24.1 The Big Data Revolution

Data mining as the process that uses a variety of data analysis tools to discover patterns and relationships in data to help build useful models and make predictions.

The more general term business analytics (or sometimes simply analytics) describes any use of statistical analysisto drive business decisions from data, whether the purpose is predictive or simply descriptive.

Big data is a term for data sets so large and complex that it becomes difficult to use traditional methods to capture, store, visualize, and analyze them.


24.1 The Big Data Revolution

Most of the models used in analytics and data mining are based on the regression ideas that you’ve studied.

The skills you’ve acquired by learning to fit models, diagnose them, and assess their limitations already prepare you to start entering the world of big data.


24.3 The Goals of Data Mining

Differences in Data Mining compared to Traditional Statistical Analysis

• The size of the databases. Data mining examples are enormous compared to traditional analysis.

• The exploratory nature of data mining. The outcome of data mining is a predictive model, not running a hypothesis test or producing a confidence interval.

• The data are “happenstance”. Data used in data mining has not been collected in a systematic way (designed experiment or survey).


24.8 The Data Mining Process

Data mining should be a team effort, because projects require a number of different skills.

A group of data mining experts have created the following graphic called the Cross Industry Standard Process for Data Mining:


24.8 The Data Mining Process

The process must start with the Business Understanding phase.

Data Understanding is central to the entire data mining project – it is crucial to understand the data warehouse, what it contains, and what limitations are present.

Once variables are selected and the response variable has been agreed upon, the Data Preparation phase begins.

Following preparation is the Data Modeling phase. The more knowledge of the data and the variables that goes into the model, the higher the chances of success for the entire project.

Finally, if the model seems to give business insight, it’s time for the Deployment phase – just keep in mind that the business environment changes rapidly, so models can become stale quickly.


24.4 Data Mining Myths

Myth 1: Find answers to unasked questions.

Myth 2: Automatically monitor a database for interesting patterns.

Myth 3: Eliminate the need to understand the business.

Myth 4: Eliminate the need to collect good data.

Myth 5: Eliminate the need to good data analysis skill.


24.5 Successful Data Mining

The first step is to have a well-defined business problem, which can help you avoid going down a lot of blind paths.

Typically, 65% to 90% of the time is spent in data preparation – investigating missing values, correcting wrong entries, reconciling data definitions, or creating new variables from old ones.



Example: Data Preparation

The PVA dataset includes 470 potential predictor variables for modeling whether a donor will donate to the next campaign.

One variable, Age, may have needed corrections or adjustments so that it can be a more effective predictor in a regression-based model.

Graphs are provided on the next slide for all 94,649 potential donors:



The group of cases with ages below 20 should immediately draw our attention – are people that young likely to be donors?



Example: Data Preparation

Closer examination reveals 23,479 missing values, and the potential donors with ages below 15 and possibly above 100 or 110 should warrant a further investigation before proceeding with the model build phase.


24.6 Data Mining Problems

When the goal is to predict a quantitative response variable, the problem is generically called a regression problem.

When the response variable is categorical, the problem is referred to as a classification problem.

Collectively, both types of problems are called supervised problems, because we are give a set of data for which we know the value of the response.


24.6 Data Mining Problems

Often, the data miner constructs a model based on a portion of the original data, called the training set.

To assess how well the model will work on future data, the modeler uses the data not used in the training set as a test set.

A final type of data mining problem is an unsupervised problem – one in which there is no particular response variable. The goal may be to build clusters of data with similar attributes.


Be sure that the question to be answers is specific. A goal as vague as “improving the business” is not likely to be successful.

Be sure that the data have the potential to answer the question. Check the variables to see whether a model can reasonably be built to predict the response.

Be aware of overfitting the data. Make sure you validate the model on a test set.


Make sure that the data are ready to use in the data mining model. Missing values, incorrect entries, and different time scales are all challenges that need to be overcome.

Don’t try it alone. Data mining projects require a variety of skills and a lot of work. Assemble the right team of people.


Data collected for recording the companies’ transactions is called transactional data.

The process of using transactional data to make other decisions and predictions, is sometimes called data mining or predictive analytics.

Business analytics describes any use of statistical analysis to drive business decisions from data.

1.1 What Are Data?


All data have a context.

Data values or observations are information collected regarding some subject. The “Five W’s”: who, what, when, where, and (if possible) why. Often we add how to the list.

Data can be numbers, names, etc., and tells us the “Who and What”.

Data are often organized into a data table like that below.

1.1 What Are Data?


The rows of a data table correspond to individual cases about Whom we record some characteristics.

These characteristics may be collected on or about …

• respondents – individuals who answer a survey

• subjects or participants – people in an experiment

• experimental units – animals, plants, websites, or other inanimate objects

Cases

1.1 What Are Data?


The characteristics recorded about each individual or case are called variables.

These are usually shown as the columns of a data table and identify What has been measured.

Variables

1.1 What Are Data?


Metadata typically contains information about how,when, and where (and possibly why) the data were collected; who each case represents; and the definitions of all the variables.

Data are typically saved in a spreadsheet, where the rows represent cases and the columns represent variables.

1.1 What Are Data?

49

Day 3 Homework• Posted at: faculty.ung.edu/kmelton/BUSA3110.html

• Includes:• Watching JMP videos• Installing JMP on your computer• Graded:

• One MyStatLab HW due Wed., 8/26/15 7:30am• Three available

• P&P Roll Verification/Logistics-D2L (due 8/27/15 1:00pm)• MyStatLab HW (due 8/30/15 7:30am)• MyStatLab Chapter 1 quiz (due 8/30/15 11:30pm)

Fall 2015 BUSA 3110 - Statistics for Business Set 1: Orientation and Context Library Readings...

Documents

Transcript of Fall 2015 BUSA 3110 - Statistics for Business Set 1: Orientation and Context Library Readings...