15.060 Data, Models, Decisions 15.060 Data, Models, Decisions

Microsoft PowerPoint - Review-final_07-v6 [Compatibility Mode]Final Review Final Review
Robert Freund, David Gamarnik, and Andreas Schulz, course materials for 15.060 Data, Models, and Decisions, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Final Exam Final Exam
Date:Date: Monday, December 17
Time:Time: 9am 12pmTime:Time: 9am-12pm
ClosedClosed book exam
You canYou can bring a calculator
Formula Sheet will bewill be provided
BUTBUT get a good nightgood night’s sleep! DMD Fall 07 Final Review 2
BUTBUT get a good nightgood night s sleep! December 15 2007
Table of Contents Table of Contents

Topic 1 :Topic 1 : Decision AnalysisDecision Analysis Topic 2 :Topic 2 : Discrete Random VariablesDiscrete Random Variables
T i 3T i 3 C i d CC l tl ii d C ti Topic 3 :Topic 3 : Covariance anC d Cd orrelorre atia oovariance an C l tion Topic 4 :Topic 4 : Continuous Random VariablesContinuous Random Variables Topic 5 Topic 5 :Topic 5Topic :5 : Statistical SStatistical amplingSamplingStatistical SamplingStatistical Sampling Topic 6 :Topic 6 : SimulationSimulation TopTopppic 7 :ic 7 : RegRegggressionression Topic 8 :Topic 8 : Linear OptimizationLinear Optimization Topic 9 :Topic 9 : Nonlinear OptimizationNonlinear Optimization Topic 10 :Topic 10 : Discrete OptimizationDiscrete Optimization
December 15 2007 DMD Fall 07 Final Review 3
You are NOT responsible for:
TOPIC 1: TOPIC 1: Decision Analysis Decision Analysis
Conditional Probabilities
TOPIC PIC 2TOPIC 2: TOPIC 2: Discrete Random Variables Discrete Random Variables
Discrete Random VDiscrete Random ariables Variables •A probability distribution for a discrete random variable X consists of
(i)(i) possible values x1, x2, . . . , xn,
(ii)(ii) corresponding probabilities p 1, p2, . . . , pn,
so that: P(X = x1) = p1, P(X = x2) = p2, . . . , P(X = xn) = pn . 050
P(Y=y) 0.20
0.50 A histogram is a display of probabilities as a
0.00
0.10
bar chart
• Probabilities are non-negative, must sum to 1, •The possible values are mutually exclusive • and collectively exhaustive (describe all the possibilities that can happen).
y
y ( p pp )
P P
3 important measures 3 important measures 1. Expected Value or Mean: (measured in units of X)
“Average outcome” – measure of central tendency
E(X ) PX ¦P(X xi )xi ¦ pi xi i i
2. Variance: (in units of X squared)2. Variance: (in units of X squared)
Squared deviation around the mean – measure of “spread”
Var((X )) V XX 2 ¦¦ P((X xii )()(xii P XX ))
2 ¦¦ ppii ((xii PXX )) 2
i i
M f “ d”Measure of “spread”
V X Var(X )
You are NOT responsible for:
The Binomial distribution The Binomial distribution
TOPIC 3TOPIC 3TOPIC 3:TOPIC 3: Covariance and CorrelationCovariance and Correlation
P P
Covariance: Covariance:
Cov(X ,Y ) E[(X PX )(Y PY )]
¦¦ P((X xii ;;Y yy jj )()(xii PXX )()( yy jj PYY )) i, j
Measures the extent to which two random variables vary together.
Correlation:Correlation: CORR(X, Y)
Correlation is unitlessCorrelation is unitless
Woorking with joint distributions W rking with joint distributions
Suppose X and Y are two random variables with jjoint distribution P((X=xii;; Y=yykk)):
Marginal distribution of X Joint distribution of X and Y
E[X ] = ∑ xi P(X = xi ) ii
Var(X ) = σ 2 X = ∑ (x − 2
i µX ) P(X = xi ) i
P(X = xi ) = ∑ P(X = xi ;Y = yk ) k
Sums of random variables Sums of random variables
Mean of a sum:
Variance of a sum:
2 2 V ( X bY ) a 2 VVar(X ) bb 2V ( ) 2. . .b COV (X Y )Var(aX bY c) (X ) Var(YY ) 2 a b COV (X ,Y ) a2V X
2 b2VY 2 2.a.b.V X VYCORR(X ,Y )
December 15 2007 DMD Fall 07 Final Review
12
=
= =
TOPIC 4: TOPIC 4: ConContiti iai bles bla esCC tinuous rantinuous randddom vardom varii blbl
Continuous random variables Continuous random variables • A continuous r.v can take any value in some interval
Exampple: W Time sppent waitinggin line at Au Bon Pain! • There are an “infinite” number of possible values that the random variable can assume • For a continuous random variable questions are phrased inFor a continuous random variable, questions are phrased in terms of a range of values.
NOTE:NOTE: You would never say: “Probability to wait exactly 10.5 minutes”!
P(W=10.5)=0 B t P b bilit t itBut: Probability to wait :
•Less than 10 minutes: P(W<10); •More than 20 minutes; P((W>20)); •Between 10 and 15 minutes: P(10<W<15).

Density functions Density functions Probability density function:
Denoted f(t): gives a “picture” of the distributio the distribution (think of a smoothed histogram) Area under the curve between 2 values a and b: P(a XX b)b) values a and b: P(a Total area under the curve = 1 (total probability)
Cumulative density function: F(t) = P(X t) P(X t) = 1-F(t) P(a X b) = P(X b) - P(X a) = F(b) – F(a)
0.35
0.25
0.2
0.15
0.1
0.05
00 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
1.2
1
0.8
0.6
0.4
0.2
0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
5
0.06
0.08
0.1
0.12
0
0.02
0.04
-6 -4 -2 0 2 4 6 8 10 12
Computing probabilities with the Normal distribution:
You want : P(a X b) where X is N(μ,)
1. Define : : Z is N(0,1)
2 Use the standard normal probabilit table (Z table)2. Use the standard normal probability table (Z table)
σ µ−
= XZ
a − µ b − µP(a ≤ X ≤ b) = P( ≤ Z ≤ ) σ σσ σ
b − µ a − µ = P(Z ≤ ) − P(Z ≤ )
σ σ
68.3%
95.4%
PV PV PV P PV PV PV .0228
.1587
mean n and n (standard deviation √n
/ ( t d d d i ti
Sum of i.i.d random variables: Sum of i.i.d random variables: Central Limit TheoremCentral Limit Theorem
X1, X2, ..., Xn independent identically distributed random variables: E[Xi] = µ, Var(Xi) = σ2 [ i] µ, ( i)
•For n>30, Sn = X1 + X2 + ...+ Xn is approximately normal with µ σ2 σ)mean n.µ and variance n.σ (standard deviation √n.σ)
X + X + ...+ X •For n>30, M n
n = 1 2
d i n
/√ )mean µ and variance σ2/n (standard deviation σ/√n )
•The probability distribution of Xi does not matter;•The probability distribution of Xi does not matter; • n does not have to be very large ( 30 is good enough); • CLT requires only 2 pieces of information:the meanCLT requires only 2 pieces of information:the mean and SD of Xiand SD of Xi
mean n and variance n (standard deviation √n
/ ( t d d d i ti
TOPIC PIC 5TOPIC 5: TOPIC 5: Statistical Sampling Statistical Sampling
Sample mean of a population Sample mean of a population
Estimator of the mean of a population (μ): Sample mean X
Population of size N
where X1,…,Xn are n R.Vs following 1, , n g the population distribution (unknown mean μ, unknown std dev )
Random sample 2: l
Random sample 1: sample mean x1
sample mean x2
is a random variable !X
By Central Limit Theorem, if n>30, then is approximately normal with mean μ and standard deviation /¥n
X
n XXX n++
Population of size N
Random sample 1: l d d
Random sample 2: sample std dev s2
S2 is an “unbiased estimator’’ of the variance, i.e. E[S2]=2
sample std dev s1
1
)( 2
»«
Confidence interval for sample mean Confidence interval for sample mean How confident are we that X is a good estimate of the true mean μ of
the population ? The realized values of X and S in the sample of size n are: x and sThe realized values of X and S in the sample of size n are:x and s
What sample size do we need to be sure that the % confidence i t l i i hi / L f h ?interval is within +/- L of the true mean μ ? the required sample size is:
if n>30, then a % confidence interval for the mean μ is:
c is such that P( -c <Z< c) = %, where Z~N(0,1):
= 90 Æ c = 1.645,, = 95 Æ c = 1.960,, = 99 Æ c = 2.576.
2
22
3 types of CI problems
There are 3 main types of confidence interval problems you should know how to do:
1. Given x , s, n, E% -> find c -> find Confidence Interval [ , ]
2. Given x, s , n , L (or the interval itself [ , ]) -> find c -> find the E % confidence level
3. Design Problem: given E%, s , L -> find c -> find the required sample size n
Confidence interval for proportion Confidence interval for proportion
Let X = number of observations in a sample of size n to have a certain characteristic, p = the actual proportion of the population to have that characteristic.
The sample proportion is approximately normally distributed with mean p, standard deviation
ºª
A % confidence interval for p is:
where c is that number for which : P(is that number for which : P( -c<Z<c) = c<Z<c) Z~N(0 1)%, Z~N(0,1)where c % Note: p is unknown:
• Option 1: replace it by its estimate p • Option 2: p=1/2 (worst case) because p(1-p) ¼ for all p
n Xp = n
Some lessons on simulation Some lessons on simulation 1. 1. Provides more info than average case analysis and simple formulassimple formulas.
2.2. You generate random variablesgenerate random variables that obey a variety of discrete and continuouscontinuous probability distributions (e.g uniform, binomial, etc).
3.3. The results are not precisenot precise, due to the inherent randomness in a simulation. We typically obtain estimatesestimates of the distributions of particular quantities of interest, meansmeans and standard deviationsstandard deviations of these distributions. F h di ib i d i fid i l d h i f F h di ib i d i fid i l d h i fFrom these distributions, one can derive confidence intervals and other inferencesFrom these distributions, one can derive confidence intervals and other inferences of statistical sampling.of statistical sampling.
4. 4. The question of how many trials or runshow many trials or runs of a simulationof a simulation can become a compcomplllex slex stttat tiatititisstititical iti ssuecal issue. F t t ly, with tod 's computi l il i Fortunatel ith t day' ting power, this is not a paramount issue for most problems.
5. 5. In practice, one should recognize that gaining managerial confidence in a simulation model will depend on at least three factorsa simulation model will depend on at least three factors:
(i) a good understandinga good understanding of the underlying management problem, (ii) one's ability to use the concepts of probability and statistics corruse the concepts of pr ectlyobability and statistics correctly, (iii) one's ability to communicate these concepts effectivelycommunicate these concepts effectively.
Decem 26
So what happens on the final exam? You may get Sampling questions !
TOPIC PIC 7TOPIC 7: TOPIC 7: Regression Regression
Multiple regressionMultiple regression Explanatory variables :Explanatory variables :
X1, X2, …,Xk taking values x1i, x2i, . . . ,xki (i = 1, . . . ,n)
YDDepenependdent varent variiaablble :e : Y taking values yi (i = 1,. . . n)
Yi = β + β x i + + βkxki + ε iModel:Model: Yi β0 + β1x1i + . . . + βkxki + ε i
ε 1, ε 2, . . . , ε n are iid random variables, N(0, σ)
Goal:Goal: Choose b0, b1, . . . , bk to minimize the residual sum of squares
y i = b + b x + + b x e = y y iy i = b0 + b1x1i + . . . + bkxki , ei = yi - y i
n n ∑ e ∑n i
2 = (yi − yi )2MiMiniimmiizeze i ∑ =1
ei i ∑ =1
(yi y i ) December 15 2007 DMD Fall 07 Final Review 28
D d i blD d i bl
Model:Model:
Regr
Degr
Regression Output Regression Output 1)1) Regression coefficients::ession coefficients b0, b1, . . . , bk sample estimates of E0, E1, . . . , Ek
2)2) Standard errorStandard error :: estimate of V • a measure of hf the amount of “f “ noiise” in thhe modell” i d
3)3) Standard errStandard errors of the coefficients , . . . , sbkors of the coefficients:: sb0 , sb1 • same role as the estimate of the standard deviation of the sample mean same role as the estimate of the standard deviation of the sample mean in sampling
b E • Prior to observing bm and sbm, m m has t-dist. with (n - k - 1) d.o.f.
s sbm
ees of frDegrees of freedomeedom:: n - (k + 1) = n - k - 1 n pieces of data;
used up up ((k + 1)) degrees of freedom to estimate b0, , . . . , bk g , b1, . . . , b
• used to test the existence of a linear relationship between Y and xm; + What is 95% confidence interval for Em? + Does the interval contain 0?
4)4) Significance test:Significance test: Is βm significantly different from zero?
• The β% confidence interval for βm is:
(bm - c × sbm, bm + c × sbm),
where c is such that : P( -c < T < c) = β/100.
StepsSteps toto findingfinding thethe CConfidenceonfidence Interval:Interval:StepsSteps toto findingfinding thethe ConfidenceConfidence Interval:Interval: 1) d.o.f. = n – k – 1 2) using β% and d.o.f. , find c on the t-table 3) using c, bm , sbm write the interval above.
• If zero does not lie in the confidence interval we are confident at the β% level that βm is different from 0.
• If zero lies in the confidence interval, then βm is not significantly different from zero: we should be skeptical that Y depends linearly on xm and we might want to p
p y m g eliminate xm from the model.
6)6) Coefficient of Coefficient of determinationdetermination:: n
( is sample mean of yy i’s.)
∑ n
y i − n ∑( y )2
Variation not accounted for by x variables= 1− Total variation
(y i=1
i y) = that is accounted for by x variables Total variation
R2 takes values between 0 and 1:
35 30 25 20
15 10
5 0
X X
R2 = 1; x values completely account R2 = 0; x values account for none of for Y values the variation in the Y values
A “good” value of R2 depends on the situation
Variation that is accounted for by x variablesVariation
• Linearity:Linearity: If there is only one explanatory variable, construct a scatter-plot of the
Checklist for evaluating linear regression modelsChecklist for evaluating linear regression models
data to check for linearity. Otherwise, use common sense to decide if a linear relationship is reasonable. (Rule of thumb for choosing no of factors n > 5(k + 2) )
• Significance tests:Significance tests: check if the regression coeffs are significantly different from zero
• Signs of Regression Coefficients:Signs of Regression Coefficients: Check to see that the signs make intuitive sense
• RR22:: Check if the value of R2 is reasonably high. • Normality:Normality: Check that the residuals are approximately Normally distributed by
constructing a histogram of residuals. • HeterHeteroscedasticity:oscedasticity: Do error terms have constant standard deviation?
Plot the residuals with the observed values of each of the explanatory variables. • Autocorrelation:Autocorrelation: Are error terms independent? If data are time-dependent,
plot the residuals over time to check for any apparent patterns. • Multicollinearity:Multicollinearity: Are two explanatory variables correlated?
Signs: if regression coeffs have “wrong” sign or we find high R2 but one or more of the regression coeffs is not significantly different from 0. Look at the correlation matrix. Large positive or negative correlations between the explanatory variables are bad.
HeteroscedasticityHeteroscedasticity
0.00
10.00
20.00
10.00
20.00
0.00
10.00
20.00
-20.00
-10.00
i
-20.00
-10.00
10 00
Important regression issues you should know
Know how to interpret the regression output. Explain in English what the coefficients mean and give intuition about how they affect the deppendent variable.
Know how to build the confidence intervals for the coefficients using the t-table.
Know how to read and interpret the regression graph and the output Know how to read and interpret the regression graph and the output residual graphs (histogram, autocorrelation, heteroscedasticity)
Know how to improve your model Ch k h i i ifi l i hi h i bl Check the signs, significance, correlation, etc… which variables to add and drop (explaining why)
Check linearity: if it fails, can you modify your data to make a b ttbetter modd l Eel. Examplle: makke a pollynomiiall
Dummy variables: you need to know how to model categorical data. Example: beer bottles red x green.
TOPIC PIC 8TOPIC 8: TOPIC 8: Linear Optimization Linear Optimization
Optimization terminology Optimization terminology
Decision VariableDecision Variable: : Describes a decision that needs to be made, e.g. how many items to produce.
Objective FunctioObjective Function: : An expression (in terms of theObjective FunctionObjective Function:: An expression (in terms of the variables) that needs to be minimized or maximized.
ConstraintConstraint: : An expression that restricts the values of the variablevalues of the variables.
Steps in formulation Steps in formulation
1. 1. Define the decisiondecision variables.variables.
2. 2. Write the objectiveobjective as a function of these vars. Determine whether max or minmax or min..
4. 4. Determine the variable restrictions,restrictions, e.g. non-negative, integer. Be careful of units! units!
3.3. Write the constraintsconstraints as functions of these vars. Either ≤ , , ≥ , =, = .
A Fundamental PointA Fundamental Point
y 4
y 40
y 4
30 1 2
If an optimal solution exists, there is always a corner point optimal solution!
30 1 2 3010 200 4030 1 2
always a corner point optimal solution!
About Shadow Prices About Shadow Prices h Associated with each constraint is a shadow price. (=0 for non
binding constraints)
hh Th h dh d ii i th h i th bj ti l itThe sshhaaddow prow priicece is the change in the objective value per unit change in the right hand side, given all other data remain the same.
h Associated with each shadow price is a range over which this shadow price holds.
hh If rIf r..h.s changesh.s changes withinwithin range:range: current solution remains optimal, shadow price tells us rate of change in the optimal objective function value;;
hh If rIf r..h.s changesh.s changes outsideoutside range:range: current solution is not optimal anymore; we need to solve the optimization pb again !
Avoid frequent mistakes! Avoid frequent mistakes! • Forgetting the non-negativitynon-negativity restrictions
• Confusing Maximizing with MinimizingMaximizing with Minimizing
•••• InconsistentInconsistent and/or incorrectincorrect unitsInconsistentInconsistent and/or incorrectincorrect units
• Wrong interpretation of the shadow pricesshadow prices.
• Change in R.H.S outsideoutside the allowable range
TOPIC PIC 9TOPIC 9: TOPIC 9: Nonlinear Optimization Nonlinear Optimization
Some possible casesSome possible cases objective objective
function levelfunction level objective objective
function levelfunction level
optimal solutionoptimal solution
Feasible Regiong
objective objective function levelfunction level
Corner solutionCorner solution
objective objective function levelfunction level objective objective
function levelfunction level
December 15 2007 DMD Fall 07 Final Review 42 nonlinear objective,nonlinear objective,
Region optimal solutionoptimal solution Feasible Region
nonlinear objective,nonlinear objective,
nonlinear constraintsnonlinear constraints linear constraintslinear constraints
Local vs global solutions Local vs global solutions O tO tiOpO timatip malll solutil onlution: AA fffeasf ibleasiblibliblee sol tlution that optiimiizesl til ti h the objective value among allamong all feasible points.
Local optimalLocal solutionoptimal solution: A feasiblefeasible solution that optimizesLocal optimalLocal solutionoptimal solution: A feasiblefeasible solution that optimizes the objective value among allamong all feasible points near itnear it Example:Example: Minimization in one variable over 2 <= x <= 7
Computer software for NLP can efficientlyefficiently find locallocal opt. BUT!BUT!BUT!BUT! thithis sollutition willill notnot necessarilily be ththe globagloballll opt.tt b l bl b t
x =x === 22 is a locallocal optimal solution.x 2x 2 is a locallocal optimal solution
x =x 3.5= 3.5 is a locallocal optimal solution.
x =x 5= 5 is the globalglobal optimal solution
43
f(x)
December 15 2007 DMD Fall 07 Final Reviewx2 3 4 5 6 7
Shadow prices in NLP Shadow prices in NLP
Review:Review: Shadow pShadow pppricerice of a constraint for LP: of a constraint for LP: IncrementalIncremental change in the optimal objective function value change in the optimal objective function value per unitper unit incrincrease in the right--hand side (RHS) of the constraint. of the constraint.ease in the right hand--side (RHS)
Shadow priceShadow price of a constraint forof a constraint for NLP: (Lagrangian multiplier) NLP: (Lagrangian multiplier) ApproximateApproximate IncrementalIncremental change in optimal objective functionchange in optimal objective function valuevalue wwiiithith smallsmall changechange iiin the RHi Sn the RHSSS..
“Binding”“Binding” constraint :constraint :BindingBinding constraint :constraint : when satisfiedwhen satisfied as equalityas equality at the optimum.at the optimum.
For nonbinding constraintsFor nonbinding constraints, shadow prices ar, shadow prices eare zero!zero! December 15 2007 DMD Fall 07 Final Review 44
TOPICTOPIC 10 10:TOPICT 10: OPIC 10: Discrete Optimization Discrete Optimization
Discrete optimization Discrete optimization
Feasible region is a set of discrete poinof discrete points.
Can’t be assured a corner point or evcorner point or even boundary solution.
Not as “easy” to solve Not as easy to solve as LP.
Solving it as an LSolving it as an LP provides a relaxation and a bound on the solution.
y 4
2
3
1
2
0
1
Modeling issues Modeling issues • Decision variables are restricted to take only integeronly integer values
• Great modeling flexibility using binarybinary variables
xi = 1 , if event i occurs
xi = 0 , otherwise
• AllocationAllocationAllocationAllocation of resources (which project to fund)of resources (which project to fund)•
• DeterminationDetermination of productivity and distribution
More on modeling issues More on modeling issues
• If x1 = 0 then x2 = 0 x2 x1
• If x1 = 1 then x2 = 1 x2 x1
• If x1 = 1then x2 = 1 and vice versa x2 = x1
• If x1 = 1 then x2 = 1 or x3 = 1 x1 x2 + x3 10
Invest in at most 2 projects ¦ x • dd 2 2 Invest in at most 2 projects ¦ xi i 1
• Select 5 out of 10 projects 10
¦¦ i 1
x 55 xii
Key concept: Analyze logical implication of constraint in all iblpossible cases
Partial taxonomy of optimizationPartial taxonomy of optimization
NonlinearNonlinear OptimizationOptimization
linear expressionslinear expressions objective and/or constraints are objective and/or constraints are nonnon--linear expressionslinear expressions
linear expressionslinear expressions
(integer) values
discrete
P bl f 2005 fiP bl f 2005 fiProblems from 2005 finaProblems from 2005 finalll l
Problem 1: TProblem 1: True or False rue or False (a) If the 95% confidence interval for the sample mean extends from 4 to
14 based on a random sample of size 60, then the sample mea was 9.
Interval is centered around the sample mean:
x-L =4 x+L=14 Midpoint: TRUE
Midpoint: x =(4+14)/2 = 9
(b)(b) If RIf R2 = 0, it means that all the data points in an y vs x regression0, it means that all the data points in an y-vs-x regression model must fall along the horizontal line
FALSE
y
FALSE
Problem 1: TProblem 1: True or False rue or False (c) A resident of Boston is chosen at random. Consider the 2 events: I. The person selected is a lawyer; II. The person selected is a lawyer and an environmental activist.
The probability of event II can never exceed that of event I.
TRUE
Environmental activists
lawyers
Problem 1: TProblem 1: True or False rue or False d) If X has mean 1, standard deviation 2 and Y has mean 1, standard
deviation 4, then the standard deviation of Z=X+Y cannot exceed 6.
Var(Z) = Var(X) + Var(Y) + 2*X*Y*CORR(X,Y)
Max when CORR(X,Y) = 1 Æ Var(Z) = 36, Z = 6 TRUE
e) Mendel asks a random number generator to create 10,000 independent selections from a N(0,1) distribution. The 10,000 selections turn out to have a sample mean of 0.0selections turn out to have a sample mean of 0.08. Assuming the random generator to work properly, the chance would be less than 1% that the sample mean would fall at least as far as it did from the true mean.
n = 10,000, x = 0.08. By CLT X~N(0,1/¥n) (approximately). P(X 0.08) = P(Z (0.08 – 0)/(1/100)) = P(Z 8) ~ 0 (8 standard deviations from the mean…)
TRUETRUE
Problem 2 (a) Problem 2 (a) John has not been feeling well recently and he believes he has a bacterial
infection with probability 0.6. He takes a test that is 99% reliable: The probability that the test is positive given that he has an infection is 99%; The probability that the test is negative given that he does not have an iinffectition iis 99%.99%
If the test result is positive, what is the probability that he has an infection?
P(INF) = 0 6P(INF) 0.6 P(!INF) = 0 4P(!INF) 0.4 P(test+ | INF) = 0.99 P(test- | !INF) = 0.99
We want : P(INF | test+)
P(INF | test+) = P(INF and test+) P(test+ | INF) = P(INF and test+)
P(test+) P(INF)
Problem 2 (a) Problem 2 (a) P(INF) = 0.6 P(!INF) = 0.4 P(test+ | INF) = 0.99 P(test- | !INF) = 0.99
We want : P(INF | test+) P(INF | test+) = P(INF and test+)
P(test+) P(tP(test+ | INF)t+ P(INF and test+) P(t t+ | !INF) P(!INF d t t+)| INF) = P(INF d t t+) P(test+ | !INF) = P(!INF and test+)
P(INF) P(!INF)
Æ P(INF and test+) = 0 990.99 0*0 6 ÆÆ P(!INF and test+) = 0 01P(!INF and test+) *0 4Æ P(INF and test+) .6 0.01 0.4 = 0.594 = 0.004
P(test+) = P(INF and test+ ) + P(!INF and test+) = 0.594 + 0.004 = 0.598
Æ P( INF | test+) = 0.594/0.598 = 0.99

Problem 2 (b) Problem 2 (b) Statistics show that the number of years a CEO spends in office is
normally distributed with mean 5.5 and standard deviation 1.2. Given that a CEO has been in office for exactly 5 years so far what is Given that a CEO has been in office for exactly 5 years so far, what is
the probability that she will still be in office 2 years from now?
X: # yyears in office : X~N((5.5,,1.2 )) now t t=7Office tenure: tOffice tenure: t=00 now t=55 t 7
We want : P(X 7 | X 5)
P( X 7 | X 5) = P(X 7 and X 5) = P(X 7) 1 P(X 7) P( X 7 | X 5) P(X 7 and X 5) P(X 7) = 1 - P(X 7)
P(X 5) P(X 5) 1- P(X 5)
P( X 7) = P(Z (7-5.5)/1.2) = P(Z 1.25) = 0.8944 Z~N(0,1): P( X 7) P(Z (7 5.5)/1.2) P(Z 1.25) 0.8944 P( X 5) = P(Z -0.417) = 0.3372 Look up in Z-table!
P( X 7 | X 5) = (1 - 0.8944)/(1 - 0.3372) = 0.1593( | ) ( ) ( )
Problem 3 Problem 3 In a random poll of 100 randomly-selected business leaders, 77% say
that they support Bernanke as new chairman of the Fed. (a)(a) What is the 99% confidence interval for the percentage of alWhat is the 99% confidence interval for the percentage of all
business leaders who support Bernanke;
n = 100, p, p = 77%
99% confidence interval for the sample proportion:
Æ 99% confidence interval: [0.66; 0.88]
Where c is that number for which P( -c < Z < c) = 99%, Z~N(0,1)Where c is that number for which P( c Z c) 99%, Z N(0,1)
i.e c = 2.576
−− pppp )1()1(
n ppcp )1(;)1(
Problem 3 Problem 3 (b) Ezekiel- who has not seen the results of the poll- wants to find a
95% confidence interval for the percentage of business leaders who supppport Bernanke. He also wants the interval to extend no more than one percentage point in each direction around its midpoin Make a sensible estimate of the number of business leaders he should poll.
2
n c Where L = 1%, and c is that number for which: 4L2
P((-c<Z<c )) = 95%,,Z~N ((0,,1),),i.e c = 1.960
Æ n = 9,604 (round up non-integer values!)
Problem 4 Problem 4
Mendel performs a linear regression analysis on the unemployment rate in Massachusetts (UM) versus the current wholesale price of fuel oil per gallon (P) in Massachusetts in inflation-adjusted dollars.
U i hl d f i i d (i i 72Using monthly data for a recent six-year period (i.e., using 72 observations), he reaches the least squares equation: UM = 2.10 + 3.00P (P is in dollars and UM in per cent.)
The R^2 value for the regression is 0.66, and the upper end for the 95% confidence interval for the slope of P is 5.00. The sample standard deviation of the monthly Massachusetts unemploymestandard deviation of the monthly Massachusetts unemployment rates over the six-year period studied was 1.00 percent.
Problem 4 (a) (c) Problem 4 (a)--(c)
(a) If fuel oil is projected to cost $1.30 in a forthcoming month, what is the estimate of the Massachusetts unemployment rate for thatis the estimate of the Massachusetts unemployment rate for that month based on the regression result?
Um = 2.1 + 3 * (1.3) = 6%
(b) Does the 95% confidence interval for the slope of P include 0?
NO: CI is symmetric around mean 3 and upper bound is 5 [1, 5]
(c) What is the sum of squared residuals of the 72 data points around the regression line?around the regression line?
Decem 60
Problem 4 (d) Problem 4 (d)
(d) Consider one at a time the following possible patterns among the residuals for this regression analysis. Briefly explain for each pattern whheth b itself it would sub tanti lly red fidence i th ther, by it lf, it ld bst tiall duce your confid in the regression analysis:
I The heavy majority of the residuals in the first three years studiedI. The heavy majority of the residuals in the first three years studied were positive, while the heavy majority of those in the second three years were negative. Autocorrelation: residuals are not casual but follow a time-basedAutocorrelation: residuals are not casual but follow a time based pattern. (Another acceptable answer would be that the relationship might be nonlinear.))
II. The residuals are consistently larger in the months when the fuel prices are high than in those in which prices are low. Heteroscedasticity: the residuals consistently get larger with larger values of the independent variable P. 61
Problem 4 (cont’d) Problem 4 (cont’d)
Fearful of an omitted variable in the regression above, Mendel performs another linear regression on the same data. For each
th th d d t i bl i till UM hil th i bl thmonth, the dependent variable is still UM, while the variables on the right are P and UN, the average unemployment rate in the other 49 American states. He reaches the revised regression equation:
UM = 1.50 + 2.00P + 0.50UN
R^2 for the revised regression was 75 while the upper ends of the R 2 for the revised regression was .75, while the upper ends of the 95% confidence intervals are 6.00 for the slope of P and 1.10 for the slope of UN.
Problem 4 (e) (f) Problem 4 (e)--(f)
(e) Do the regression results provide statistically convincing evidence that UN really belongs in the regression model? Briefly discuss.
NO: both 95% CIs contain 0: P [ -2, 6 ] and Un [-.1, 1.1]
(f) Suppose that UN and P exhibited strong positive correlation over(f) Suppose that UN and P exhibited strong positive correlation over the six years studied. What general problem in regression analysis might result from that circumstance? How might that problem have affected the regression results?affected the regression results?
Multi-collinearity: the independent variables are highly correlated amongg themselves. This mayy neggativelyy affect the statistical significance of both variables (like in this case).
Problem 5 Problem 5 Recall the Filatoi Riuniti case and linear optimization model, where the firm would like to determine its monthly outsourcing strategy for spun yarn among six
yarn.
The objective function is to minimize the variable cost (including transportation cost) for meeting demand for the four spun yarn sizes (Extrafine, Fine, Medium, and Coarse).
There are four types of constraints in the modeThere are four types of constraints in the model: 1. Filatoi must meet monthly demand for each of the four spun yarn sizes. 2. None of the seven mills can exceed their monthly production capacity. 3. Neither Ambrosi nor De Blasi can produce Extrafine yarn. 4 All d isiion vari bl be nonnegatiive.4. All deci iables must b
Suppose that demand for spun yarns is the same as in the original case, as are the production capacities and machine hour requirementsthe production capacities and machine hour requirements.
other spinning mills as well as their own internal production strategy for spunother spinning mills as well as their own internal production strategy for spun
Problem 5 Problem 5 Suppose, however, that over time the variable production and transportation costs have changed, and that the current data for Filatoi Riuniti’s production problem for the coming month of January are shown in Table 1 belowproblem for the coming month of January are shown in Table 1 below.
Decem 65
Problem 5 Problem 5 Roberto Cominetti has re-run the linear optimization model using this new data, resulting in the optimal solution shown in Table 2 along with the Sensitivity Report shown in Table 3 Please answer the following questionsSensitivity Report shown in Table 3. Please answer the following questions based on the linear optimization model solution and Sensitivity Report.
Problem 5 Problem 5
Problem 5 (a) (b) Problem 5 (a)--(b) (a) What are binding constraints in the model? In the optimal plan for the coming month, which spinning mills would use all of their spinning capacity to produce spun yarn for Filatoi Riuniti?spun yarn for Filatoi Riuniti?
All the constraints are binding, except for Capacity at Giuliani. This is the only mill that has not its capacity fulfilled under the optimal strategy.
(b) What would be the cost impact of increasing the required production of Extrafine yarn from 25,000 kg to 27,000 kg? What can you say, if anything, about the cost impact of increasing the required production of Extrafine yarnabout the cost impact of increasing the required production of Extrafine yarn from 25,000 kg to 29,000 kg?
Shadow Price = 18.397 ($/kg). Max increment allowed +3,197.5 Kg Additi l C 2 000 K * $18 397/K $36 794 / hAdditional Costs = +2,000 Kg * $18.397/Kg = $36,794 / month Max Additional Costs = +3,197.5 Kg * $18.397/Kg = $58,824.4 (for additional 3,197.5 Kg). Nothing can be said for the remaining 802.5 Kg except that they would cost at least $18.397/kg.would cost at least $18.397/kg.
Problem 5 (c)Problem 5 (c)--(d)(d) (c) Another local spinning mill by the name of Havarti has informed Filatoi that they can produce Fine spun yarn for Filatoi for a delivered cost of $14 25/kg Should Filatoi consider entering into an agreement with Havarti to $14.25/kg. Should Filatoi consider entering into an agreement with Havarti to produce Fine spun yarn at this price?
NO: The shadow price for the demand of fine is 14.018. Hence, if we were to produce less fine yarn with the current machines and outsource it to Havarti, we would save 14.018 per Kg, and the extra cost would be 14.25 per Kg, so it is not worth it.
(d) According to the model’s data, monthly capacity at De Blasi is 2,600 spinning machine hours. However, Filatoi Riuniti has just received an email from the outsourcing manager at De Blasi indicating that capacity for the coming month will be curtailed to 2,200 spinning machine hours due to some unanticipated machine maintenance. How much will this change the total variable cost of producing and/or outsourcing spun yarn in the coming month?
Shadow Price = -.086 ($/hour) Additional costs = (-400 hours) * (-$.086/hour) = $34.4
Problem 5 (e) Problem 5 (e) (e) How much do you think Giuliani would have to reduce the price they charge Filatoi Riuniti for Fine spun yarn in order for Filatoi to want to discuss outsourcing production of Fine spun yarn to them?outsourcing production of Fine spun yarn to them?
The shadow price for fine yarn is $14.02. De Blasi would have to reduce their price below this level.
Problem 6 Problem 6 Forest Capital (FC) has decided to appoint Sarah Edwards as the new portfolio manager of its portfolio of technology and utility stocks in emerging markets, which is currently comprised of various amounts in ten different companiwhich is currently comprised of various amounts in ten different companies.
Table 4 below shows the current portfolio weights, the latest annualized expected return and standard deviation estimates, and the classifications of each of the ten companies.
December 15 2007 71
Problem 6 Problem 6 The estimated correlations among the returns of the ten companies are shown in Table 5. Note in Table 5 that FC assumes for simplicity that returns among stocks are uncorrelated except among stocks A B, and Cand C.stocks are uncorrelated except among stocks A, B
Problem 6 Problem 6 Sarah has decided to use an optimization model to select the new weights of the portfolio for the coming month. She would like to maximize the expected return of the portfolio subject to the following constraintreturn of the portfolio subject to the following constraints:
1. The standard deviation of the resulting portfolio should be at most 8%.
2. The amount of turnover of the portfolio should be at most 30%. As an example of how turnover is calculated, if prior to trading a portfolio has 70% of its funds in Stock 1 and 30% in Stock 2, and after the trade the weights are 60% for Stock 1 and 40% for stock 2 the turnover of the portfolio is (|7060% for Stock 1 and 40% for stock 2, the turnover of the portfolio is (|70 60)|+ |30-40|)= 20%.
3. 0 0
Last month, the total portfolio weight in technology 0.088+0 00.077+0 10.177+0 00.099+0 0.08 08 = 0 49 S
stocks was 0.49 = 49% 49%. Sarahh would ld liklike to mai intaiin th the
character of the portfolio as a balanced portfolio between technology and utility stocks. For this reason, she would like the total weight of the portfolio inin technologytechnology stocksstocks toto bebe betweenbetween 45%45% andand 55%.55%.
4. All portfolio weights need to be nonnegative. That is, short positions are not allowed in the portfolio.

Problem 6 (a) Problem 6 (a) (a) Write down a formulation of a nonlinear optimization model to determine the new weights of the portfolio.
wi = fraction of the resulting portfolio invested in stock i
Obj MAX (.08w1 + .12w2 + .15w3 + .11w4 + … +.08w9 + .05w10) Subject t Subject to: w1 + w2 + w3 + w4 + … + w9 + w10 = 1 (fractions)
[(.13)2(w1)2 + (.25)2(w2)2 + (.35)2(w3)2 + … + (.07)2(w10)2 + 2(.13)(.25)(.4)(w1)(w2) ++ 2( 13)( 35)(2(.13)(.35)(-.1)(w1)(w11)(w)(w33) + 2( 25)( 35)( 1)(w) + 2(.25)(.35)(.1)(w22)(w)(w33)])]1/2 08.08
[|w1 - .12| + |w2 - .08| + |w3 - .07| + … + |w10 - .08|] .3
w2 + w3 + w5 + w7 + w8 55w2 + w3 + w5 + w7 + w8 .55
w2 + w3 + w5 + w7 + w8 .45
wi 0 (for each i from 1 to 10)wi 0 (for each i from 1 to 10)
Problem 6 (b) Problem 6 (b)
(b) Suppose that in order to trim the rather excessive transaction costs in emerging markets Sarah would like to limit her portfolio to stocks in only sixemerging markets, Sarah would like to limit her portfolio to stocks in only six different companies. How would you augment your formulation of the model using binary variables to incorporate this requirement into the model?
Problem 6 (c) Problem 6 (c)
(c) Suppose that Sarah would like to limit the number of trades to at most seven of the ten companies (Note that if a stock’s weight does not change seven of the ten companies. (Note that if a stock s weight does not change, itit does not produce a trade.) By defining binary variables, describe how you would augment your model to incorporate this additional requirement as well.
Good luck ! Good luck !
There are things MBAs can’t solve … There are things MBAs can’t solve …
For everything else, there is DMD !For everything else, there is DMD !
Additional Practice Problems Additional Practice Problems
TOPIC 2: TOPIC 2: Discrete Random Variables Discrete Random Variables
BBeer and Cd C k oke daily salles att a soccer sttadium d il di
Probability ppi
X:# of Beer Cans xxi
Y:# of Coke Cans yyi
0.15 0.27 0.150.15 0.26 0.17
35 78 8181 30 16
41 10 00 13 42
S Pl f D il S l f B dC k Scatter Plot of DailySales of Beer andCoke
40 50
al es
C ok
e Sa
Beer Sales
December 15 2007
The Beer and Coke Example
• Wh t i the expected numb f b ld? of cok ld?• What is th t d ber of beer cans sold? f ke cans sold?
• What is the standard deviation of beer cans sold? of coke cans?
• What is the covariance and the correlation of beer and coke cans sold?
• What is the expected daily revenue?
•What is the standard deviation of the daily revenue?
S Q tiSome Questions
Summary of Daily Beer Sales
ThThe expectedd numbber of b ld i f beer cans sold is Px= E(X) = 6i p(X=xi)xi
The variance of beer cans sold isThe variance of beer cans sold is V2 =VAR(X)=6 p(X=xi)(xi - Px)2
x
The standard deviation of beer cans sold is The standard deviation of beer cans sold is Vx= VAR(X)
Here it turns out that :
P(X ) b t ll P(X=xi)=pi, but usually:
P(X=xi)=j P(X=xi; Y=yj)
Summary of Daily Coke Sales
Prob. # Beer Cans
0.15 35 5.25 29.32
0.15 41 6.15 70.18 0 27 2 70 23 71
0.15 81 12.15 153.79 0.26 30 7.80 93.66 0.17 16 2.72 184.91
0.27 23.71 0.15 0 0.00 56.28 0.26 13 3.38 10.55 0.17 42 7.14 87.06
Fall 07 Final ReviewStdDev(X)=26.25
E(X)= 48.98 VAR(X)=689.06 E(Y)= 19.37 VAR(Y)=247.77
StdDev(Y)=15.74
Some Questions
0 27 78 21 06 227 380.27 78 21.06 227.38 10
The correlation of beer and coke cans sold is CORR(X,Y) = COV(X,Y)/( Vx Vy)
Scatter Plot of Daily Sales of Beer and Coke
40 50
al es
The covariance of beer and coke cans sold is COV(X,Y)=6 pi(xi - Px) (yi - Py)
0 10 20 30
Beer Sales
C ok
e Sa
S um m ary of D aily B eer and C oke S ales
P rob. N um ber of B eer C ans
N um ber of C oke C ans
p i x i yi p i ( x i E (X )) (yi - E (Y)) 0.15 35 41 -45.36 0.27 78 10 -73.42 0.15 81 0 -93.03
E(X)=48.98 E(Y)=19.37
0.15 81 0 93.03 0.26 30 13 31.43 0.17 16 42 -126.88
C O V (X,Y) = -307.25
December 15 2007 DMD Fall 07 Final Review C orrelation = -0.74 StdDev(X)=26.25
StdDev(Y)=15.74
Some Questions
Some More Questions About Beer and Coke
X= number of cans of beer sold; Y= number of cans of coke sold
Revenues : $3 per can of beer, $2 per can of coke
Daily revenue (in $) = 3X+2Y • What is the expected daily revenue? E(X)=48.98
E( 3 X + 2 Y ) = 3 E(X) + 2 E(Y) E(Y)=19.37 = $3 * 48.98 + $2 * 19.37
StdDev(X)=26.25 •
= $185.68 What is the standard deviation of the daily revenue?
( ) 2 2
StdDev(Y)=15.74 Cov(X,Y)= - 307.25
VAR( 3 X + 2 Y ) = 32 * VAR(X) + 22 * VAR(Y) + 2 * 3 * 2 * COV(X,Y)
= 9 * 689+ 4 * 248+ 12 * (-307)
( )
= $185 68
* ( ) * ( )
TOPICTOPIC 44: TOPIC 4: TOPIC 4: Continuous Random Variables Continuous Random Variables
The amazon.com The amazon.com exampleexample
• The time, in minutes, spent surfing amazon.com last
X month by people
N(170 10) in this auditorium is normally distributed:
X~N(170,10) What if we triple the time spent of a randomly chosen student? Y=3X
µY=3µx=3(170) = 510
x =9(100) = 900, σY=30
• Let’s take 3 independent students at random and combine the time they spent on amazon.com last month
Y X +X +X µ =3µ
Y=X µY=3µx=3(170) = 1+X2+X3
510
σ 2 Y =Var(X1+X2+X3) =3(100)=300, σY= 17.32
L t’ t k 3 i d d t t d t t d d bi
=3(170) = 510
Th lTh l
What is the probability that a randomly selected student has spent between 160 and 180 minutesbetween 160 and 180 minutes on amazon.com last month ?th ? X~N(170,10) P[[160 <X < 180]] = ?
P [(160-170)/10 < (X-P)/V <(180-170)/10]=
P [ 1<Z<1] = F(1)F(1)-F(F(-1) = 0 8413 0 1587=0 6826!P [-1<Z<1] 1) 0.8413-0.1587=0.6826!
What is the pprobabilityythat three indethree indeppppendentendent students together have spent more than 460 minutesmore than 460 minutes?
Y=X1+X2+X3 ~ N(510, 17.32)
P(Y>460)=P(Y-510/17.32>460-510/17.32)=P(Z> -2.89) December 15 2007 DMD Fall 07 Final Review 871-P(Z<-2.89) = 1-0.0019 = 0.9981!
The amazon.com exampleThe amazon.com example
TOPIC 5: TOPIC 5: StatisticalStatistical SamplinSamplingStatistical Sampling Statistical Sampling
After having managed to successfully survey 100 families we have found that the observed sample mean of the annual income is $19,763 while the observed sample standard deviation is $4,000.
a) What is the distribution of the sample mean (i l di h f f h di ib i i d d d(including the form of the distribution, its mean and standard deviation)?
_ The sample mean X follows a normal distribution with mean P and standard deviation V/n : N(P, V/n)
Annual income exampleAnnual income example
P

b) What is the probability that the sample mean will be within $784 of the population meaof the population mean?
_ P((- $784 < X - P <$_784))
= P( -$784/(V/n) < (X - P/(V/n) < $784 /(V/n) ) ~ P( -$784/(s/n) < Z < $784 /(s/n) ) = P( $784/(4000/100) < Z < $784 /(4000/100) ) 100) ) P( -$784/(4000/ 100) < Z < $784 /(4000/ = P(-1.96 < Z < 1.96) = P(Z < 1.96) - P(Z < -1.96) = 0.975 - 0.025 = 0.95

_ _
c) What is a number L such that the probability that the sample mean is within L of the population mean is 99mean is within L of the population mean is 99% ?
A 99% confidence interval for the sample mean is given by: [x - cc s/ *s/n x + c*s/s/n]n][x n , x + c _
(where c = 2.576 (E=99%), s= 4000, n= 100,and x = 19,763)
Therefore L=c*s/s/ n 1030.4Therefore, L=c n =1030 4
So the 99% confidence interval is given by: [19 763-2 576*4000/100 19 763+2 576*4000/100] [19,763 2,576 4000/ 100, 19,763+2,576 4000/ 100] = [18,732.6, 20,793.4]
d) How many families should we successfully survey so that the probability that the sample mean is within $200 of the population mean is 95% ?
T t t 95% i t l th t i ithi $200 f thTo construct a 95% interval that is within $200 of the population mean, the required sample size n is given by: n = c2s2/L2
= 1.962 * 40002 / 2002
_ _
A hotel manager would like to find out the mean time guests have to wait for room service. For a sample of 4545 guests the observed sample mean turned out to be 32 minutes32 minutes while the observed standard deviation 111 minutes1 minutes. out to be 32 minutes32 minutes while the observed standard deviation 111 minutes1 minutes.
• What is the 95% confidence interval95% confidence interval for the mean time guests have to
wait for room service?
We assume the mean time guests have to wait for room service is approximately Normal.
A 95% confidence interval for the mean time guests have to wait is given by: [ x-c*s/n, x+ c*s/n ] _ where c = 1.96 (E=95%), s = 11, and x = 32
So the 95% confidence interval is given by:
[ 32-1.96*11/ 45, 32+1.96*11/ 45 ] = [28.79, 35.21] December 15 2007 DMD Fall 07 Final Review 93
The Room Service ExampleThe Room Service Example
TOPIC 7: TOPIC 7: Regression RegressionRegression Regression
Robert Freund, David Gamarnik, and Andreas Schulz, course materials for 15.060 Data, Models, and Decisions, Fall 2007.
MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
An Ice An Ice CrCream Exampleeam Example The fat content in a gallon of chocolate ice cream is believed to depend on
CrCream, Chocolate eam, Chocolate and Sugarand Sugar according to:
Fat =AFat =A +B*Cr +B*Cream +C*Chocolate +D*Sugaream +C*Chocolate +D*Sugar
A multiple regression was run on data from 20 differ20 differeent batchesnt batches of chocolate ice cream: R Square: 0.8433
Standard Error: 13.73
Intercept -8.94 19.95 -0.45 -51.24 33.35
Cream (ounces) 0.93 0.12 7.80 0.67 1.18
December 15 2007 DMD Fall 07 Final Review 95 Choc. (ounces) 2.07 0.60 ??? ??? ???
Sugar (ounces) 2.47 1.33 1.86 - 0.34 5.29
An Ice An Ice CrCream Exampleeam Example
Correlation between different variables:
Fat (gm) 1
Choc (ounces)Choc. (ounces) 0.486 0.025 1
Sugar (ounces) 0.280 -0.099 0.409 1
• Compute Compute the 95% CI forthe 95% CI for Choc. Choc. coefficientcoefficient
• CCrritiitique moque moddeell
0 486 0 025 1
C iti d lC iti d l
th ?
1<Z<1] = F(1) 1) = 0 8413 0 1587=0 6826!
An Ice An Ice CrCream Exampleeam Example
• Compute Compute the 95% CI forthe 95% CI for Choc. Choc. CoefficientCoefficient
Coefficients Standard Error t-Stat. Lower 95% Upper 95%
Intercept -8.94 19.95 -0.45 -51.24 33.35
Cream (ounces) 0.93 0.12 7.80 0.67 1.18
Choc. (ounces) 2.07 0.60 ??? ??? ???
Sugar (ounces) 2.47 1.33 1.86 - 0.34 5.29
The 95% confidence interval for the Choc. coefficient using c=2.120 from the T-table, will be: [2.07 – 2.120*0.60, 2.07 + 2.120* 0.60]
[= [0.798, 3.342]
0 798 3 342]
Signs of RegrSigns of Regression Coefficientsession Coefficients Coefficients Standard Error t-Stat. Lower 95% Upper 95%
Intercept -8.94 19.95 -0.45 -51.24 33.35p
Cream (ounces) 0.93 0.12 7.80 0.67 1.18
Choc. (ounces) 2.07 0.60 3.45 0.80 3.34
S ( ) 0 34 5 29Sugar (ounces) 2.47 1.33 1.86 - 0.34 5.29
The coefficients for Cream, Choc and Sugar appear to make sense.
Significance test:Significance test:
0 is in the confidence interval for Sugar coeff. so Sugar should be excluded from the regressionexcluded from the regression.
2 47 1 33 1 86
RR2 2 -- The value for R2 is 0.8433 which indicates that the model has a high level of prediction.
Multicollinearity:Multicollinearity: Fat (gm) Cream (ounces) Choc (ounces) Sugar (ounces)Fat (gm) Cream (ounces) Choc. (ounces) Sugar (ounces)
Fat (gm) 1
Sugar (ounces) 0.280 -0.099 0.409 1
There is a high correlation between chocolate and sugar (>0.4) hence we should eliminate one of these variables - sugar because of the low t-statistic.s a s c.
suga because o e ow
HeterHeterooscedasticity:scedasticity: al
Th t b h t d ti it
Autocorrelation:Autocorrelation: e Residuals vs. Sample Number
es id
Residual Distribution:Residual Distribution:
nc y
Residual Frequency
Fr eq
ue n
December 15 2007 DMD Fall 07 Final Review 101The residuals appear to be normally distributed
Residual
DDifferent MModes of DDriving manufacturer of cars & trucks.
Vehicles are processed in the paint and body shops.
Painting trucks takes 1.5 times1.5 times as much time as painting cars. If the paint shop only paints trucks, then it paints 40 trucks/day40 trucks/day. If it only paints cars, then 60 cars/day60 cars/day.
Body work on cars and trucks takes the same amount of time If the body shop onlyBody work on cars and trucks takes the same amount of time. If the body shop only produces trucks, then 50/day50/day. If only produces cars, then 50/day50/day.
Trucks contribute $500$500 and cars contribute $400$400 to profit.
Determine daily production schedule to maximizemaximize profits.
Decision Variables :Decision Variables : C=# cars, T=# trucks
Objective Function:Objective Function: MaxMax 400 C+ 500 T
Constraints: Paint Shop:Constraints: Paint Shop: T/40+C/60 <=1 day
Body Shop:Body Shop: T/50+C/50 <=1 day
Objective Function:Objective Function: MaxMax 400 C+ 500 T
y py p y
Different Modes of Driving ExampleDifferent Modes of Driving Example
TrucksTrucks Different Modes of Driving Example...Different Modes of Driving Example...
Which are the Binding Constraints?Which are the Binding Constraints?
Optimal SolutionOptimal Solution 50
40 $ , y
20 Profit:Profit: $10,000/DayFeasibleFeasible Profit:Profit: $
T/40+C/60 <=1T >=0
RegionRegion Profit:Profit: $5,000/Day
December 15 2007 DMD Fall 07 Final Review 104CarsCars0 5025 C>=0
O ti l S l tiO ti l S l ti
5 000/Day
Adjustable Cells Adjustable Cells Final Reduced Objective Allowable Allowable
Cell Name Value Cost Coefficient Increase Decrease $B$2 Cars Decision Variables 30 0 400 100 66.66666667 $B$3 Trucks Decision Variables 20 0 500 100 100$B$3 Trucks Decision Variables 20 0 500 100 100
Constraints Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease $B$7 Paint Shop Constraints 1 12000 1 0.25 0.166666667 $B$8 Body Shop Constraints 1 10000 1 0.2 0.2
Different Modes of Driving Example...Different Modes of Driving Example...
pp
• An outside contractor offers to paint 8 more trucks8 more trucks ((or 12 more cars)or 12 more cars)
per day for $2,000.$2,000. Should we accept the offer?
Yes, based on the shadow prices, this expansion is worth:
$12,000 * 8/40 - $2,000 = $400
and, the increased capacity of 8/40 or 0.2 is within the allowable increase.
• If the DMD company was given extra labor to increase productivity
in the body shop by 5 cars (or trucks)by 5 cars (or trucks), what would DMD’s profits becomeprofits become?
Increased profit is $10,000 * 5/50 = $1,000
and, the increased capacity of 5/50 or 0.1 is within the allowable increase.
EconomicEconomic InterpretationInterpretation
20 K20 K
Value of Paint ShopValue of Paint Shop 0 0.83 1.00 1.25
0 K
Sl 0
Slope = 12,000
Slope = 0
Slope = 24,000
In this range, every unit change in the RHS results in a $12,000$12,000 unit change in the objective function
change in the objective function.
This value is called the shadow priceshadow price of the constraint over this range.
TOPIC 9:TOPIC 9: Nonlinear OptimizatioNonlinear OptimizationNonlinear Optimization Nonlinear Optimization
You are producing three products A, B and C. You need to satisfy production
limits and resource availability constraints: (1) You can produce at most 1000,
800 and 700 units of A, B and C respectively; (2) The data for resource
il bilit i f llavailability is as follows:
A B CA B C ResourcesResources (in hours)(in hours)
machine 1machine 1 2 1 3 5 1000machine 1machine 1 2 1 3.5 1000
machine 2machine 2 0.2 0.8 1.2 350
Production levels influence market price of each product:
PA=200 - XA + 0.5 XB, PB =100 - 2XB + 0.25 XA, PC = 500 - XC
December 15 2007 DMD Fall 07 Final Review 109 We want to MaximizeMaximize revenue
A Production ExampleA Production Example
=
Decision Variables
X = product A to be produced by machine 1 X1A product A to be produced by machine 1
X1B = product B to be produced by machine 1
X1C= product C to be produced by machine 1
X2A = product A to be produced by machine 2
X2B = product B to be produced by machine 2
X2C = product C to be produced by machine 2 2C p p y
Objective Function
MaxMax PA * (X1A + X2A) + PB * (X1B + X2B) + PC * (X1C + X2C)
FormulationFormulation
Subject to:
Price: PA= 200 - (X1A + X2A) + 0.5 * (X1B + X2B), PB = 100 - 2 * (X1B + X2B) +0.25 * (X1A + X2A) , PC = 500 - (X1C + X2C)
Resource:
Machine 2: 0.2 X2A + 0.8 X2B + 1.2 X2C <= 350
Production Limit: A: X + X <= 1000 Production Limit: A: X1A + X2A <= 1000 B: X1B + X2B <= 800 C: X1C + X2C <= 700
Non-negativity: X1A ,X2A ,X1B ,X2B ,X1C ,X2C , PA , PB , PC >= 0
More Formulation...More Formulation...
$
Decision Variables: 58 82 23 53 125 00Decision Variables: 58.82 23.53 125.00 (units) 58.82 23.53 125.00
Price A 105.88$ Price B 35.29$
Objective Function: 76,617.65$
Price C 250.00$
MAX maximize revenues
Machine 1 2 1 3.5 Machine 2 0.2 0.8 1.2
Machine Limit 1000 hours
Capacity Limit 1000 800 700 units units units
Constraints: LHS RHS machine 1 capacity 578.68 <= 1000 machine 2 capacity 180.59 <= 350 product A limitproduct A limit 117 65117.65 <<= 10001000 product B limit 47.06 <= 800 product C limitr 15 2007Decembe DM250.00 D Fall 07 Final<= Review 700
Excel SolutionExcel Solution
Sensitivity Report Sensitivity Report Mi ft E l 10 0 S iti it R tMicrosoft Excel 10.0 Sensitivity Report Worksheet: [Book1]Products Report Created: 12/9/2004 10:19:06 PM
Adjustable Cells Final Reduced
Cell Name Value Gradient $B$8 units 58.82 0 $C$8$C$8 i 23.53 00units 23 53 $D$8 units 125.00 0 $B$9 units 58.82 0 $C$9 units 23.53 0 $D$9$D$9 units 125.00 00units 125.00
Constrain

15.060 Data, Models, Decisions 15.060 Data, Models, Decisions

Documents

Transcript of 15.060 Data, Models, Decisions 15.060 Data, Models, Decisions