Fall 2015 BUSA 3110 - Statistics for Business Set 1: Orientation and Context Library Readings...
-
Upload
mitchell-nichols -
Category
Documents
-
view
222 -
download
0
Transcript of Fall 2015 BUSA 3110 - Statistics for Business Set 1: Orientation and Context Library Readings...
Fall 2015BUSA 3110 - Statistics for Business
Set 1: Orientation and ContextLibrary Readings
Chapters 24 and 1.1 (Current Role of Statistics)
Kim I. Melton, Ph.D.
2
Module 1
• Day 1• Orientation
• Logistics• Who are we: Roll and Introduction• Resources: Physical and Virtual• Location(s): Physical and Virtual
• Expectations
• Course Description• Content and Organization
• Homework
3
Logistics
• Roll• Brief Syllabus (note HW assignment at the end)
• Homework for the first two weeks posted at:faculty.ung.edu/kmelton/busa3110.html
• During the second week, all of my sections will be combined in D2L under Section E (There will be no change to your official Banner registration.)
• After the second week a master list of all assignments will be posted in D2L.
4
System of Profound Knowledge
About me
5
Materials That You Need to Purchase• Custom packet
(available at Dahlonega and Gainesville Bookstores)
• Includes:• Selected chapters from Business
Statistics, 3rd Edition by Sharpe, De Veaux, and Velleman
• Access to MyStatLab with• Homework management system• Chapter quizzes• Complete text of Business Statistics,
3rd edition• Answers to odd problems• Data for problems in the text• And more
6Setting up MyStatLab.com
Ignore - I will not be linking MyStatLab to D2L this semester.
Ignore (for now): You will be able to use temporary access until Drop/Add ends.
7
D2L
Other Support Materials
Software.ung.edu
UNG Library
All of my sections will be combined into Section E after Drop/Add
Software for Calculations
8
Pros Available for Windows and Macs
(born on the Mac) Statistical software with an
interface that is similar to a spreadsheet
Dynamically linked analysis Graphics first; computations
follow Wide array of statistical analyses
available in a menu driven approach
Cons Not as widely available Less familiar Requires new thought processes
Excel
Pros
Spreadsheet software that is widely available
Familiarity
Quick basic insight
Cons
Analysis ToolPak needed for statistical analysis beyond the most basic is not available on all platforms (e.g., Mac and Office 365)
Analyses are not linked (especially graphical and inferential)
Requires the user to have more statistical knowledge to use effectively and appropriately
Limited in terms of analyses available
Think “and” not “or”
Learning Expectations for Class
• Attendance• Arrive on time• Stay the entire time
• Preparation• Spend time before class reading
(text and homework)• Take notes while in class • Work homework after class
• Professionalism• Take responsibility for learning
• Believe you can learn statistics• Ask questions• Try to answer questions • Seek help EARLY when you are struggling
• Be ethical• Put phones away
9
INVEST
10
What is Statistics (as a field of study)?
Videos• BA and Optimization [2010] (1:15 minutes)• Think (1 minute)• Turning Data into Insight (3:26 minutes)
How (and why) is the field of Statistics changing?
Source: http://www.datasciencecentral.com/profiles/blogs/data-veracity
11
Data/Information/Knowledge/Wisdom
Doing things right(Efficiency)
Doing the right things(Effectiveness)
DATA
INFORMATION
KNOWLEDGE/UNDERSTANDING
WISDOM
Symbols (raw values) that represent properties of objects/events
Describes; provides answers to who, what, where, and when questions
Explains; provides answers to how to and why questions
Evaluates knowledge/understanding; deals with values; uses judgment; answers what is best and why
Based on the work of Russell Ackoff. See “From Data to Wisdom” in Ackoff’s Best, pp. 170-174, 1999.
The Course
12
Applied
Statistics • Uses data • From situations where variation exists• As input to quantitative models• That provide information• To help develop the knowledge • For making decisions• That inform action
• For use in a practical setting• Where theoretical assumptions may
not apply perfectly• and results and limitations need to
be communicated in the language of the situation
Models (and theories) must be viewed in a context
13
14
Course DescriptionA second course in statistical methods with special orientation to applications in business. Emphasis will be placed on application of statistical techniques, assessing their appropriateness, and communicating results to various audiences. Topics include:• data collection, • sampling, • data visualization, • data analysis, • model building using regression, and
• other statistical techniques. Statistical software is used extensively in the course. This course should be taken as soon as the prerequisite is satisfied. Prerec: MATH 2400 with a grade of C or higher.
15
Learning Outcomes(Course Level)
Upon completion of this course, students should be able to:
• select appropriate statistical methods to guide decision-making
• generate and use statistical output to analyze data• identify the limitations of the statistical methods covered• communicate how statistical studies were conducted and
the results of those studies• recognize ethical issues related to the collection and
analysis of data and the communication of the results of the analysis
16
Day 1 Homework
• Details of your homework assignment (to be completed prior to coming to class on Wednesday) is posted at faculty.ung.edu/kmelton/busa3110.html
• This assignment does include • reading the entire syllabus• accessing and reading an article available through the
UNG library• bringing written material with you to class• obtaining your book and access to MyStatLab
• NOTE: We meet in the lab next class period (Dahlonega: NOC 109 ; Gainesville: MTN 5100)
Day 2
• Syllabus questions• Analytics 3.0 Article
• Analytics 1.0, 2.0, and 3.0• Role that statistics plays in each
• And how the field of statistics has changed (and continues to change)
17
18
Syllabus
• Text, MyStatLab, other resources• Accessing material D2L and MyStatLab• Software: JMP and MS Office• Course Format• Grading• General expectations (especially
deadlines, make-ups, extra credit, academic integrity, phones)
19
Content
• Eight “Modules” (Sets of Slides)1. Orientation and Context
2. Basics of Data Collection
3. Data Visualization and Quantitative Summary
4. Data Collection in Practice
5. Inference Involving One Variable
6. Simple Linear Regression
7. Multiple Regression Basics
8. Model Building
Data Information
Knowledge
Wisdom
20
Format
Each Module will include:• Reading material (from the
text and/or supplemental assignments)
• Daily graded homework (app. 30 minutes in MyStatLab)
• Chapter quizzes (app. 15 minutes in MyStatLab)
Most Modules will include:• A short video related
to the topic• JMP computer
assignments using• In-class or out-of-class
graded assignments developed by the instructor
Monday classes meet in the traditional classroom (NOC 016 or MTN 3214).Wednesday classes meet in the computer lab (NOC 109 or MTN 5100).
21
Grading
• MyStatLab Homework (16 points)
• MyStatLab Quizzes (16 points)
• Instructor Supplied Assignments (66 points)
• Preparation and Participation (10 points)
• Final (0-12 points) 90 and above A
80 – 89 B
70 – 79 C
60 – 69 D
Below 60 F
22
General Expectations
• Learning is not a divided responsibility (I teach, you learn)—learning is a joint responsibility (we learn together)
• My “hot buttons”• Timeliness• Ethical behavior• Professional orientation toward learning• Recognition that “true” learning involves
more than getting the right answer.
23
Analytics
24
• What role does statistics play in the developing field of Analytics?
OR
• How does Analytics imply a need for increased understanding of statistics?
25
Impact of Analytics on the Way we Think about ___
• Evolution vs. revolution• Improvement vs. innovation• 1st order change vs. 2nd order change
• change in how we do something (1st order) vs. change in what we do (2nd order)
• Paradigm shift…makes us go back to the most basic assumptions
Statistic
s?
The Changing Face of Statistics
• Methods• Calculations and Graphs• Calculators and Tables• Mainframe Computers and Specialized Software• Personal Computers and Spreadsheets• Personal Computers and Specialized Software
• Users/Creators • From: Limited individuals with specialized training/education• To: Democratization of Statistics
• Major Areas of Focus• Descriptive (no attempt to go beyond the current data set)• Inferential (a sample to draw conclusions about a population)• Process oriented (uses past data collected over time to make decisions for
the future)• Big Data and Analytics
• Predictive• Prescriptive
26
27
Your List?(Role of Statistics in Analytics)
1. ___ 7. ___
2. ___ 8. ___
3. ___ 9. ___
4. ___ 10. ___
5. ___ 11. ___
6. ___ 12. ___
28
Day 2 Homework
• Posted at: faculty.ung.edu/kmelton/BUSA3110.html
• Includes:• Reading assignment• MyStatLab Homework
• One due Sunday, 8/23/15 11:30pm• One available (due Wed., 8/26/15 7:30am)
29
Week 2(Chapters 24 and 1.1 and Intro to JMP)
• Why start with the LAST chapter in the book?
• This is a second course in statistics. This chapter lets you reflect on the tools/techniques from the first course…and sets the stage for this course.
CONTEXT
Copyright © 2015 Pearson Education. All rights reserved. 24-30
24.1 The Big Data Revolution
Data mining as the process that uses a variety of data analysis tools to discover patterns and relationships in data to help build useful models and make predictions.
The more general term business analytics (or sometimes simply analytics) describes any use of statistical analysisto drive business decisions from data, whether the purpose is predictive or simply descriptive.
Big data is a term for data sets so large and complex that it becomes difficult to use traditional methods to capture, store, visualize, and analyze them.
Copyright © 2015 Pearson Education. All rights reserved. 24-31
24.1 The Big Data Revolution
Most of the models used in analytics and data mining are based on the regression ideas that you’ve studied.
The skills you’ve acquired by learning to fit models, diagnose them, and assess their limitations already prepare you to start entering the world of big data.
Copyright © 2015 Pearson Education. All rights reserved. 24-32
24.3 The Goals of Data Mining
Differences in Data Mining compared to Traditional Statistical Analysis
• The size of the databases. Data mining examples are enormous compared to traditional analysis.
• The exploratory nature of data mining. The outcome of data mining is a predictive model, not running a hypothesis test or producing a confidence interval.
• The data are “happenstance”. Data used in data mining has not been collected in a systematic way (designed experiment or survey).
Copyright © 2015 Pearson Education. All rights reserved. 24-33
24.8 The Data Mining Process
Data mining should be a team effort, because projects require a number of different skills.
A group of data mining experts have created the following graphic called the Cross Industry Standard Process for Data Mining:
Copyright © 2015 Pearson Education. All rights reserved. 24-34
24.8 The Data Mining Process
The process must start with the Business Understanding phase.
Data Understanding is central to the entire data mining project – it is crucial to understand the data warehouse, what it contains, and what limitations are present.
Once variables are selected and the response variable has been agreed upon, the Data Preparation phase begins.
Following preparation is the Data Modeling phase. The more knowledge of the data and the variables that goes into the model, the higher the chances of success for the entire project.
Finally, if the model seems to give business insight, it’s time for the Deployment phase – just keep in mind that the business environment changes rapidly, so models can become stale quickly.
Copyright © 2015 Pearson Education. All rights reserved. 24-35
24.4 Data Mining Myths
Myth 1: Find answers to unasked questions.
Myth 2: Automatically monitor a database for interesting patterns.
Myth 3: Eliminate the need to understand the business.
Myth 4: Eliminate the need to collect good data.
Myth 5: Eliminate the need to good data analysis skill.
Copyright © 2015 Pearson Education. All rights reserved. 24-36
24.5 Successful Data Mining
The first step is to have a well-defined business problem, which can help you avoid going down a lot of blind paths.
Typically, 65% to 90% of the time is spent in data preparation – investigating missing values, correcting wrong entries, reconciling data definitions, or creating new variables from old ones.
Copyright © 2015 Pearson Education. All rights reserved. 24-37
24.5 Successful Data Mining
Example: Data Preparation
The PVA dataset includes 470 potential predictor variables for modeling whether a donor will donate to the next campaign.
One variable, Age, may have needed corrections or adjustments so that it can be a more effective predictor in a regression-based model.
Graphs are provided on the next slide for all 94,649 potential donors:
Copyright © 2015 Pearson Education. All rights reserved. 24-38
24.5 Successful Data Mining
The group of cases with ages below 20 should immediately draw our attention – are people that young likely to be donors?
Copyright © 2015 Pearson Education. All rights reserved. 24-39
24.5 Successful Data Mining
Example: Data Preparation
Closer examination reveals 23,479 missing values, and the potential donors with ages below 15 and possibly above 100 or 110 should warrant a further investigation before proceeding with the model build phase.
Copyright © 2015 Pearson Education. All rights reserved. 24-40
24.6 Data Mining Problems
When the goal is to predict a quantitative response variable, the problem is generically called a regression problem.
When the response variable is categorical, the problem is referred to as a classification problem.
Collectively, both types of problems are called supervised problems, because we are give a set of data for which we know the value of the response.
Copyright © 2015 Pearson Education. All rights reserved. 24-41
24.6 Data Mining Problems
Often, the data miner constructs a model based on a portion of the original data, called the training set.
To assess how well the model will work on future data, the modeler uses the data not used in the training set as a test set.
A final type of data mining problem is an unsupervised problem – one in which there is no particular response variable. The goal may be to build clusters of data with similar attributes.
Copyright © 2015 Pearson Education. All rights reserved. 24-42
Be sure that the question to be answers is specific. A goal as vague as “improving the business” is not likely to be successful.
Be sure that the data have the potential to answer the question. Check the variables to see whether a model can reasonably be built to predict the response.
Be aware of overfitting the data. Make sure you validate the model on a test set.
Copyright © 2015 Pearson Education. All rights reserved. 24-43
Make sure that the data are ready to use in the data mining model. Missing values, incorrect entries, and different time scales are all challenges that need to be overcome.
Don’t try it alone. Data mining projects require a variety of skills and a lot of work. Assemble the right team of people.
Copyright © 2015 Pearson Education. All rights reserved. 1-44
Data collected for recording the companies’ transactions is called transactional data.
The process of using transactional data to make other decisions and predictions, is sometimes called data mining or predictive analytics.
Business analytics describes any use of statistical analysis to drive business decisions from data.
1.1 What Are Data?
Copyright © 2015 Pearson Education. All rights reserved. 1-45
All data have a context.
Data values or observations are information collected regarding some subject. The “Five W’s”: who, what, when, where, and (if possible) why. Often we add how to the list.
Data can be numbers, names, etc., and tells us the “Who and What”.
Data are often organized into a data table like that below.
1.1 What Are Data?
Copyright © 2015 Pearson Education. All rights reserved. 1-46
The rows of a data table correspond to individual cases about Whom we record some characteristics.
These characteristics may be collected on or about …
• respondents – individuals who answer a survey
• subjects or participants – people in an experiment
• experimental units – animals, plants, websites, or other inanimate objects
Cases
1.1 What Are Data?
Copyright © 2015 Pearson Education. All rights reserved. 1-47
The characteristics recorded about each individual or case are called variables.
These are usually shown as the columns of a data table and identify What has been measured.
Variables
1.1 What Are Data?
Copyright © 2015 Pearson Education. All rights reserved. 1-48
Metadata typically contains information about how,when, and where (and possibly why) the data were collected; who each case represents; and the definitions of all the variables.
Data are typically saved in a spreadsheet, where the rows represent cases and the columns represent variables.
1.1 What Are Data?
49
Day 3 Homework• Posted at: faculty.ung.edu/kmelton/BUSA3110.html
• Includes:• Watching JMP videos• Installing JMP on your computer• Graded:
• One MyStatLab HW due Wed., 8/26/15 7:30am• Three available
• P&P Roll Verification/Logistics-D2L (due 8/27/15 1:00pm)• MyStatLab HW (due 8/30/15 7:30am)• MyStatLab Chapter 1 quiz (due 8/30/15 11:30pm)