A Case Study of Bayesian Modeling on a Real World Problem
-
Upload
emmanuel-garrett -
Category
Documents
-
view
26 -
download
0
description
Transcript of A Case Study of Bayesian Modeling on a Real World Problem
![Page 1: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/1.jpg)
1
A Case Study of Bayesian Modeling on a Real World Problem
RAM Energy Energester/Enziro
Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik Banerjee
![Page 2: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/2.jpg)
2
ISS – Intelligent Systems Solutions
Group of researchers/academics Working with CAS (Centre for Adaptive
Systems) Remit:
Provide Technology Transfer and Expertise to Industry
Assist NE SME’s and stimulate business growth Obtain funding, e.g. SMART Awards, GONE,
etc.
![Page 3: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/3.jpg)
3
ISS Projects
RAM Energy – Intelligent Data Analysis
Neptune Engineering – Intelligent Diagnostics
HASS – Back-office system/DBase
Hart Biological – Back-office system/Dbase,
process manufacturing
Etc.
![Page 4: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/4.jpg)
4
RAM Energy Founded 2000 Clients in Oil/Gas, Energy, Process,
Manufacturing, Haulage Industry Products Energester +Enziro
Ester based synthetic lubricants and greases, enzymatic cleaning solutions, absorbents and blasting media
Better lubrication, heat dissipation and vibration reduction than oil or grease in isolation and conventional additives
![Page 5: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/5.jpg)
5
RAM Energy
ProblemDemonstrate effectiveness and cost efficiencyData collected by RAM Energy
very large major differences across the various sectors
Assist RAM Energy in structuring their data collection and storage in general
Heavy haulage industry
![Page 6: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/6.jpg)
6
RAM Energy
Trials RAM energy carried out select trials with
clients. These included: Monitored consumption prior to Energester use
Monitored consumption post Energester use
Use of control vehicles (no Energester use)
Temperature data collected
![Page 7: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/7.jpg)
7
RAM Energy Haulage
Data collected via diesel receipts Information consisted of
Card number (allocated to regn number) Vehicle registration Date Fuel Mileage
![Page 8: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/8.jpg)
8
Registration Number Date Reg Entered Fuel Added Mileage
J577PWL 20020901 DX51MYT 276.19 128504
J577PWL 20020902 DX51MTY 296.51 129130
J577PWL 20020904 DX51MYT 288.88 999
J577PWL 20020905 J577PWL 235.95 666
J577PWL 20020907 J577PWL 346 1
J577PWL 20020907 J577PWL 234.86 1
J577PWL 20020908 DX51NYT 211 99999
J577PWL 20020909 DX51MYT 447.73 11
J577PWL 20020910 51 286.24 4717
J577PWL 20020910 DX51MYT 253.07 135300
J577PWL 20020911 DX51MYT 281 1
J577PWL 20020912 51 220.66 1000
J577PWL 20020912 DX51MYT 260 1
J577PWL 20020913 DU02PBY 325 1
J577PWL 20020914 DU02PBY 255.59 109705
J577PWL 20020915 DU02RBY 267.17 110296
J577PWL 20020915 2 267.62 120889
J577PWL 20020916 DU02PBY 182.16 111563
J577PWL 20020916 DU52PBY 260.02 112043
J577PWL 20020917 2 263.91 2646
J577PWL 20020917 DU02PBY 224.81 113223
J577PWL 20020918 2 251.09 3773
J577PWL 20020918 DU02PBY 224.67 114513
![Page 9: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/9.jpg)
9
RAM Energy
AnalysisPerformed using Excel spreadsheetsDiscrete mpg (mileage since last fill/diesel input)Some cumulative mpg using total mileage/total
diesel input to date)Attempt to normalise using mean temperature
records Some regression analysis
![Page 10: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/10.jpg)
10
Fuel Consumption Rover 75 W608 UOH
32
34
36
38
40
42
44
46
48
50
52
1 11 21 31 41 51 61 71
Fill No.
MPG
.Discreet MPG
Cumulative MPG
Adjusted MPG
![Page 11: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/11.jpg)
11
RAM Energy Results
No seasonal adjustment
With seasonal
After Energester 42.94 43.46
Before Energester 42.66 42.64
Percentage gain 0.64% 1.92%
![Page 12: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/12.jpg)
12
RAM Energy Problems
Missing data consisted of Driver information (who?)Loading information (full/empty)Length of journeyType of journey (long haul vs short haul)Urban or motorway conditionsEtc.
![Page 13: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/13.jpg)
13
RAM Energy Conclusion
Results very poor and inconclusive
![Page 14: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/14.jpg)
14
Database
Excel sheets were converted to an Access database with deletion of unnecessary rows and columns.
The Access database was then imported into SQL Server for data query and subsequent analysis
![Page 15: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/15.jpg)
15
Data Cleansing
Brief outline of most obvious problems with the data 1. Card Number2. Registration Number3. Date4. Fuel Added5. Mileage
![Page 16: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/16.jpg)
16
Card Number There were duplicate Card Numbers for
(presumably) the same Card, e.g. 85944 and 0085944 In a few cases, for a given Registration
Number, there appear additional Card Numbers, e.g. for ‘N151EUB’ there are the Card Numbers:
38195 0038195 56408
![Page 17: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/17.jpg)
17
Registration Number
Registration numbers seemed to be always entered correctly
However, the field Reg Entered did not always tally with this
RAM recommendation to ignore
![Page 18: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/18.jpg)
18
Date
Dates entered very consistent preserved the ordering distance between dates the actual date
An important question was: CAN WE PRESUME THE DATE IS ALWAYS ENTERED CORRECTLY ?
If this was so, then this provided us with a convenient check on the Mileage, as Date and Mileage should both increase together.
![Page 19: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/19.jpg)
19
Fuel Outlier identification
Very small and very large values easily detected over large dataset
Take mean of the sample and flag as outliers data more than 3 or 4 SD’s away from the mean
Very small values e.g. 0 or 1 assumed as bogus values
9999, 999, etc. taken to be bogus valuesSome small and large values mistyped, with
either the decimal place occurring too soon (e.g. 38.6 instead of 386) or extra digits added (e.g. 3860 instead of 386)
![Page 20: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/20.jpg)
20
Fuel
Difficult errorse.g. 693392.. could be 69392 ? What if
693399 ?Data must be flagged as erroneous
![Page 21: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/21.jpg)
21
Mileage
Some values were entered as {0,1,999,9999,2,3,5,10,111,1111,123,789, etc}
If we can presume that the Date is a sensible value, then in a dataset where there are only a few missing or obviously incorrect values for the Mileage, these values can be amended as follows
![Page 22: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/22.jpg)
22
Mileage
Day Mileage Spurious?
11 300
12 400
13 500 ?14 450 ?
We do not know if the day 13 entry is wrong, or day 14. So we can look ahead:
![Page 23: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/23.jpg)
23
MileageDay Mileage Spurious?
11 300
12 400
13 500
14 450 ?15 510
Day Mileage Spurious?
11 300
12 400
13 500 ?14 450
15 470
Or
![Page 24: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/24.jpg)
24
Mileage
Trans Quantity (Fuel Added) Odometer (Mileage)
182.04 55525
236 0
290 1
268.33 57589
Trans Quantity (Fuel Added) Odometer (Mileage)
182.04 55525
236+ 290 + 268.33 = 794.33 57589
Collapsed to:
![Page 25: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/25.jpg)
25
Mileage
Small and very large values could be ignored Problem was determining whether any of the
remaining data was valid – data validation Evaluating the degree of correlation between the
increasing Date, and the supposed increasing Mileage
Useful approaches for estimating rank-orderedness and correlation between lists Spearman’s coefficient of rank correlation Kendall’s Tau
![Page 26: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/26.jpg)
26
Data Cleansing
![Page 27: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/27.jpg)
27
Ram Energy Data Validator
![Page 28: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/28.jpg)
28
![Page 29: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/29.jpg)
29
![Page 30: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/30.jpg)
30
Bayesian - Approach In Bayesian approach to statistical inference,
express uncertain beliefs about things in terms of probability E.g. that there is a 50% chance that the average fuel
consumption of a vehicle will be less than 30mpg
Can use probabilities in this way to describe uncertainty about things we do not know E.g. amount of fuel in a vehicle’s tank at 10.00am
yesterday
![Page 31: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/31.jpg)
31
Bayesian - Approach
Once we accept this view of probability, the principle for learning from data is simple
Before we see the data, we have a probability distribution based on our knowledge up to that point prior distribution
When we see the data our probability distribution changes, in the light of new information in the data posterior distribution.
![Page 32: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/32.jpg)
32
Bayesian - Approach
Calculation used to get from the prior distribution to the posterior distribution Uses Bayes’ theoremHence Bayesian statistics
Very straightforward interpretation of the results when using this method
Posterior distribution tells us how likely it is that various things are true, after we have used the evidence in the data
![Page 33: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/33.jpg)
33
Bayesian - Approach
Different observers can have different prior beliefs and this means that their posterior distributions will also be different make prior distribution represent very little information in practice prior tends to have little effect on posterior
One advantage of this approach is that it is straightforward to calculate what we expect various things to be after seeing the data For example, can calculate a posterior probability
distribution for the cost savings of applying the fuel additive to a whole vehicle fleet
![Page 34: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/34.jpg)
34
Bayesian - Model
The basic model used is a regression, with fuel used as the dependent variable and distance travelled as one of the explanatory variables
Each observation corresponds to the time between two successive additions of fuel to the fuel tank
Expect zero fuel to be used if zero distance were travelled, amount of fuel used is not necessarily proportional to the distance travelled
For example, fuel efficiency may be greater on longer journeys
![Page 35: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/35.jpg)
35
Bayesian - Model
Simplest form of the model, assume that fuel used is proportional to distance travelled
Constant of proportionality which is the slope of the line on a graph
Various other forms of relationship were also investigated.
While distance travelled is most obvious explanatory variable, there are several other variables and factors which must be taken into account
![Page 36: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/36.jpg)
36
Bayesian - Factors Vehicle Types
Type of vehicle has effect Individual vehicles of same type may also
have different characteristicsEffect of individual vehicles (within a type)
was regarded as a random effectVehicles seen as a sample from all vehicles of
that type
![Page 37: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/37.jpg)
37
Bayesian - Factors
DriversDriver identified by card numberDrivers closely associated with vehicles In this case, difficult to separate effects of
vehicles from the effects of driversHowever, if this were not the case, then it
would be possible to make inferences about individual drivers as well as individual vehicles
![Page 38: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/38.jpg)
38
Bayesian - Factors
Time of yearFuel efficiency may be affected by ambient
temperature/meteorological variables Ideally use meteorological dataObtained data for this purposeBut, as a first step, a simple substitute is to
use the time of year, e.g. month
![Page 39: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/39.jpg)
39
Bayesian - Factors
Presence of fuel additiveThe main question of interest is, “How does
the use of the fuel additive affect fuel consumption?
![Page 40: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/40.jpg)
40
Bayesian - Complications
Fuel How full the fuel tank was before or after fuel was
added Precisely how much fuel was used between fills
True tank content regarded as a latent or “hidden” variable Such variables can be built into a Bayesian analysis
![Page 41: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/41.jpg)
41
Bayesian - Complications
Data entry errors Graph of odometer readings against date for a single
vehicle shows the general pattern - spurious values This built into the model by allowing certain prior
probabilities for errors of different types The analysis can thus “recognise” errors by
calculating posterior probabilities that a reading is an error of the various types
Those values which have large posterior probabilities of being erroneous are, in effect, ignored by the rest of the analysis.
![Page 42: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/42.jpg)
42
Bayesian - Conclusions
Prototype Bayesian models were successfully run
Demonstrated feasibility of approach for this problem
However: Need to overcome problems of missing data Uncertainty over when additive would be expected to
have an effect Pattern of this effect Confounding of additive effect with the effects of other
factors such as the changing seasons
![Page 43: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/43.jpg)
43
Bayesian Results
Posterior probability density for the effect of the additive, in litres per mile
![Page 44: A Case Study of Bayesian Modeling on a Real World Problem](https://reader035.fdocuments.us/reader035/viewer/2022070402/568137e4550346895d9f933d/html5/thumbnails/44.jpg)
44
Conclusions
Recommendations:Design of better trials and data acquisitionCollection of ambient temperatures, etc.
Future DirectionsFraud detectionEfficiency of individual drivers/vehiclesPatterns of work, optimisation