Building Bayesian networks from basin modeling scenarios...

34
Building Bayesian networks from basin modeling 1 scenarios for improved geological decision making 2 Gabriele Martinelli * , Jo Eidsvik 3 Dept. of Mathematical Sciences, Alfred Getz’ vei 1, 4 Norwegian University of Science and Technology, Trondheim, Norway 5 Richard Sinding-Larsen, Sara Rekstad 6 Dept. of Geology and Mineral Resources Engineering, Sem Sælands veg 1, 7 Norwegian University of Science and Technology, Trondheim, Norway 8 Tapan Mukerji 9 Department of Energy Resources Engineering, School of Earth Sciences, 10 Stanford University, USA 11 Abstract 12 Basin models are used to gain insights about a petroleum system and to simulate geological processes 13 required to form oil and gas accumulations. The focus of such simulations is usually on charge and timing- 14 related issues, although uncertainty analysis about a wider range of parameters is becoming more common. 15 Bayesian Networks are useful for decision making in geological prospect analysis and exploration. In this 16 paper we propose a framework for merging these two methodologies: by doing so, we explicitly account 17 for dependencies between the geological elements. The probabilistic description of the Bayesian Network is 18 trained by using multiple scenarios of Basin and Petroleum Systems Modeling. A range of different input 19 parameters are used for total organic content, heat flow, porosity, and faulting, to span a full categorical 20 design for the Basin and Petroleum Systems Modeling scenarios. Given the consistent Bayesian Network 21 for trap, reservoir and source attributes, we demonstrate important decision making applications such as 22 evidence propagation and the value of information. 23 Keywords: Bayesian Networks, Scenario Evaluation, Basin Modeling, Uncertainty Quantification, 24 Petroleum Exploration 25 * Corresponding author Email address: [email protected] , [email protected] (Gabriele Martinelli ) Preprint submitted to Petroleum Geoscience

Transcript of Building Bayesian networks from basin modeling scenarios...

Page 1: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Building Bayesian networks from basin modeling1

scenarios for improved geological decision making2

Gabriele Martinelli∗, Jo Eidsvik3

Dept. of Mathematical Sciences, Alfred Getz’ vei 1,4

Norwegian University of Science and Technology, Trondheim, Norway5

Richard Sinding-Larsen, Sara Rekstad6

Dept. of Geology and Mineral Resources Engineering, Sem Sælands veg 1,7

Norwegian University of Science and Technology, Trondheim, Norway8

Tapan Mukerji9

Department of Energy Resources Engineering, School of Earth Sciences,10

Stanford University, USA11

Abstract12

Basin models are used to gain insights about a petroleum system and to simulate geological processes13

required to form oil and gas accumulations. The focus of such simulations is usually on charge and timing-14

related issues, although uncertainty analysis about a wider range of parameters is becoming more common.15

Bayesian Networks are useful for decision making in geological prospect analysis and exploration. In this16

paper we propose a framework for merging these two methodologies: by doing so, we explicitly account17

for dependencies between the geological elements. The probabilistic description of the Bayesian Network is18

trained by using multiple scenarios of Basin and Petroleum Systems Modeling. A range of different input19

parameters are used for total organic content, heat flow, porosity, and faulting, to span a full categorical20

design for the Basin and Petroleum Systems Modeling scenarios. Given the consistent Bayesian Network21

for trap, reservoir and source attributes, we demonstrate important decision making applications such as22

evidence propagation and the value of information.23

Keywords: Bayesian Networks, Scenario Evaluation, Basin Modeling, Uncertainty Quantification,24

Petroleum Exploration25

∗Corresponding authorEmail address: [email protected] , [email protected] (Gabriele Martinelli )

Preprint submitted to Petroleum Geoscience

Page 2: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

1. Introduction26

The correct integration of geological and geophysical information within a decisional framework for the27

purpose of oil and gas exploration is a challenge that will become more important with increasing cost28

and exploration difficulties of new targets. Currently it is common practice among scientists to quantify29

information about risk through detailed exploration analysis, and then forward these results to management.30

From the geophysicists’ side we can interpret 2D and 3D seismic surveys and magnetic, gravimetric and31

electromagnetic data. From the geologists’ side we can evaluate the chance of having adequate trapping,32

reservoir facies, seal capacity and charge. The latter is aided by basin modeling studies. Other aspects33

concerning the economical evaluation of a prospect (costs/investments connected to development in case34

of success) must also be taken into account by the decision makers. In the transition towards the decision35

makers the information is processed and quantified through expert opinions and commercial software (such36

as GeoX R©) for risk assessment, multiple-scenario evaluation and estimation of the amount and value of37

hydrocarbons (HC) resources under study. In this work we propose a supplement to the existing framework38

by integrating directly basin modelling scenarios and decision strategies.39

We can identify the problem by analyzing how we currently move from the Earth model to the decision40

space: the geological and geophysical know-how is first translated into basin and petroleum system modeling41

(BPSM). Outputs from multiple runs of basin modeling under different geologic scenarios are then used to42

establish a Bayesian network (BN) that models play element dependencies. The BN is used to test decisions.43

In this work we have used a common commercial software for BPSM, namely PetroMod R©. Petromod is44

based on a finite-element simulator (Hantschel and Kauerauf, 2009) that numerically solves the coupled45

system of equations for sediment compaction, heat flow, petroleum generation and migration, accounting46

for both chemical and physical processes. We have not used the PetroRisk R©extension of the software, since47

we needed to control explicitly all the imputed scenarios.48

In this framework a sensitivity analysis is then carried out, and a database with multiple runs (corre-49

sponding to different geologic scenarios) is built. The database is the starting point for the value assessment50

part that provides the basis for efficient decisions.51

The idea of modeling play element prospect dependencies with a BN was proposed in VanWees et al.52

(2008) and Martinelli et al. (2011). Martinelli et al. (2011) constructed a BN model for assessing the53

likelihood of source presence in a part of the North Sea. The network describes the prior distribution of the54

source system in terms of kitchen, prospects and segments. We will use the word segment for identifying55

a volume possibly filled with HC resulting from a source-reservoir-trap system, while we will use the word56

2

Page 3: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

prospect for describing a collection of segments that may share some common features.57

When the BN is established, one can use standard techniques to propagate the evidence at certain58

nodes to all other nodes. This allows us to study the value of information (VOI) at one or more segments59

(Bhattacharjya et al., 2010). Similar ideas were developed in VanWees et al. (2008). We will use a Matlab60

package developed by Murphy (1999) to learn, build and perform inference on the network.61

One of the critical points of Martinelli et al. (2011) was the substantial belief in expert opinion when62

designing the BN. In the present paper we propose an alternative idea for building the BN, integrating expert63

opinions with quantitative geological data. The main idea is to train the probabilistic structure of the BN64

from the multiple basin modeling outputs. This is done by statistical parameter estimation, together with65

discretization and clustering guided by geological intuition. This BN model couples the geological processes66

and their responses with risk assessment. Assigning expected revenues to segments, the production strategy67

and other required economic variables can now easily be communicated. The BN model provides explicit68

probability statements, at single-segments and for prospects.69

Using statistical design of experiment (DOE) with oil and gas forecasting problems is not new: Damsleth70

et al. (1992) and Dejean and Blanc (1999) propose a DOE based approach for reservoir modeling simulations;71

Corre et al. (2000) extends DOE and Monte Carlo methods in order to study uncertainties in geophysics,72

geology and reservoir engineering. Other relevant works include Wendebourg (2003) and Wendebourg and73

Trabelsi (2005); the former uses DOE for determining and later calibrating the most influential variables74

and thus reducing the uncertainty of the outcome, while in the latter sensitivity analysis is performed on75

critical parameters that determine petroleum generation and migration.76

Dependency among wildcat wells has been discuss in Kaufman and Lee (1992), where a binary logit model77

for the number of successes is proposed. Kaufman and Lee (1992) mention, though, that the forecasting78

capacity of the model was poor in absence of a correct geological model of the basin.79

The paper is organized as follows: In Section 2 we introduce BPSM and the synthetic case study; Section80

3 discusses the DOE simulation setup with interpretations. In Section 4 we show the procedure for developing81

the BN model. Finally, in Section 5, we apply the model for decision making and in Section 6 we provide82

some guidelines and discussion topics for the extension of the methodology to a real case study; Section 7 is83

the conclusions.84

3

Page 4: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

2. A Case study for basin and petroleum systems modeling85

2.1. Basin and Petroleum Systems Modeling86

BPSM is a useful component in exploration risk assessment and is applicable with increasing reliability87

during all stages of exploration, from frontier basins with no well control to more mature areas. The idea is88

to simulate the geological and chemical reactions that have occurred in the basin through geological time, in89

order to identify the critical aspects of the HC generation, migration and accumulation. Important geological90

risk factors in oil and gas exploration are the trapping (consisting of trap geometry, reservoir and seal), the91

oil and gas charge (migration and source factors), and the timing relationship between the charge and the92

formation of potential traps. These risk factors apply equally to basin, play and prospect scale assessments.93

BPSM software combine seismic, well, geological and petrophysical information to model the evolution of94

a sedimentary basin. As output Petromod R© will predict if, and how, a reservoir has been charged with HC,95

including the source and timing of HC generation, migration routes and amount of HC both at subsurface96

or at surface conditions.97

In this paper we will use the 3D version of the software, that allows for full visualization of the migration98

paths that lead to the accumulation of HC in the basin.99

2.2. The Bezurk case study100

We have decided to use as training model a synthetic basin developed in the Petroleum Geology class101

at NTNU, Trondheim, Norway (Tviberg, 2011). The controlled basin environment is called Bezurk Basin102

(Figure 1), and it includes three potential kinds of prospects, namely anticlinal type, fault type and a103

shoestring type. The latter is located within impermeable shale and consequently the chances for HC to104

migrate into this reservoir are low, therefore we will not use it in our discussion. The Bezurk basin mimics105

the behavior of a possible real basin with a main anticlinal trap on the NE sector of the basin, and a series of106

faults in the NS direction. All lithologies are based on those in the Petromod library and default values are107

used for sand/shale compaction coefficient, porosity-permeability trends and type of consolidation/sealing108

relations. A description can be found in the supplementary materials. A major uplift followed by a strong109

erosion has occurred in the western part of the basin, and this activity has caused the major faulting shown110

by Faults 1 and 2.111

The history of the basin has been characterized by the deposition of organic-rich shale and good-porosity112

sandstone layers. In particular, we recognize two main possible HC producing layers, the deepest being113

the coal bed layer denominated Eek, and the shallowest being a shale rich in organic content denoted Mlf.114

4

Page 5: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Fault 1

Fault 2

Eek (source)

Ou (res.)

Mlf (source)

Mmd (res.)

Figure 1: Bezurk basin; we see the 100 km2 area and the different thicknesses of the layers; in the west part of the basin weidentify the two faults that characterize the system.

Another assumption is that the Bezurk basin is an onshore basin, with sediment surface at zero meters above115

the sea level.The depositional history started 55 Ma ago and has continued until today, with a number of116

erosional episodes. Figure 2 shows two cross-sections of the basin. Marked in black (Eek) and pink (Mlf) are117

the two main source rocks, and in yellow (Mmd) and red (Ou) the two main reservoirs. The third reservoir118

layer, a shoestring reservoir, is best visible in the second cross-section and lies between the two Mua seal119

layers just in the synclinal part of the basin. The main anticlinal reservoirs are clearly visible in the first120

cross section, in the eastern part of the basin.121

We have identified 2 main plays, corresponding to the two main potential reservoir rocks:122

• The reservoir of the Mmd play in the Bezurk Basin is made up of sandstone, deposited in a regressive123

shallow marine environment during the time interval 20Ma to 15Ma. The sandstone reservoir has124

porosity ranging from 12% to 30%, which is considered to be a good porosity. The reservoir covers125

the whole area on the east side of the faults, and has a thickness ranging from about 300-900m.126

• The reservoir of Ou play is deposited from 34Ma to 23Ma in a transgressive shallow marine environment127

with the overlying Mlf shale acting as a seal. The underlying Eek-coal is deposited on a coastal plain128

5

Page 6: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

in the same transgressive system as the reservoir and it is expected to generate HC due to its depth of129

burial and the corresponding Heat Flow. Potential traps lie along the western faults and form four-way130

closure of the northeastern anticlinal; they are similar to the traps of the younger Mmd-play. The131

porosity of the Ou reservoir ranges from 7% to about 20%, which overall is lower than the porosity in132

the Mmd reservoir. Both reservoirs have the same kind of sandstone, but due to compaction the lower133

reservoir (Ou) has a lower porosity than the upper reservoir (Mmd).134

Generated HC are expected to migrate to the overlaying reservoir. The critical factor is the geological135

timing, both for the Ou-play and for the Mmd-play. In both plays the seal is deposited on top of the136

reservoir rock. The sealing efficiency may be inadequate to keep the HC inside the trap in scenarios with137

early generation and migration. This can cause large amounts of HC to be lost.138

The basin is exposed to normal faulting at a young age (11 Ma). Two faults are observed in the profiles139

(western part). The faults are considered to be closed faults. HC accumulated in these trap segments140

constitute the fault-prospect. The critical factors of the prospects are the uplift and erosion related to the141

faulting.142

2.3. Expected Results143

• HC generation: Both source rocks are buried deep enough to generate HC. Eek is deposited in a144

coastal plain environment in the time interval 34.8 Ma to 34 Ma, and is today located at a depth of145

about 3000m to 5000m. The lithology of the deepest source rock (Eek) is coal, which mainly generates146

gas, but can be also oil prone. The source rock which today is at the depth of 2000m to about 4500m147

is the Mlf black shale. Mlf is deposited in a deep marine environment in the time interval of 20.60Ma148

to 20.00Ma, and is expected to generate both oil and gas. The generated HC are expected to migrate149

into the overlying Anticlinal-prospect and the Fault-prospect.150

• Anticlinal prospect: The Anticline prospect is expected to contain HC in both the Mmd reservoir151

and the Ou reservoir. It has a four-way closure and no large risks are related to the trapping mechanism.152

The sealing rocks for both reservoirs are shale, which over time are expected to obtain adequate sealing153

capacity and thus prevent the HC from leaking during the last part of the migration process. The154

lower accumulation is expected to contain more gas than the upper accumulation, due to Eek source155

rock being more gas prone.156

• Fault prospect: The Fault prospect contains some more uncertainties regarding HC preservation.157

The trap mechanism is a normal fault, which has remained closed from 11Ma to today. However, the158

6

Page 7: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Qal

Mua (seal)

Mmd (res.)Mlf (source)

Ou (res.)

Eek (source)

Base

Qal

Muh

Mlq (res.)

Mua (seal)

Mua (seal)

Mmd (res.)

Mlf (source)Ou (res.)

Eek (source)

Base

Figure 2: Cross sections. In the top one we can recognize the four way anticlinal trap located in the eastern part of the basin;in both we can identify the faults in the western part of the basin.

effect of the uplift and the subsequent erosion in the western part of the basin needs to be modeled: will159

the timing of the fault and its sealing capacity be adequate to hold accumulations in place throughout160

the basin development? Other crucial questions that need to be evaluated relate to the change in the161

geometry of the basin with time and how it affects the flow paths, and the size of the drainage area of162

the anticline that influences the accumulation of the Fault prospect.163

7

Page 8: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

2.4. The master model164

We have designed a master model by establishing a plausible petroleum system scenario and a series165

of boundary conditions. In particular we have chosen a constant heat flow (HF) of 60 mW per m2 , that166

corresponds to a moderately active basin (Allen and Allen, 2005). We have estimated the paleo water167

depth (PWD) according to the depositional environment through time (see Table 5 in the Supplementary168

materials). Finally, since Bezurk is conceived as an onshore basin there is no water present and the sediment-169

water-interface temperature (SWIT) is in reality the sediment-air-interface temperature.170

An illustrative run (Figure 3) shows that the sole prospect that today is filled with HC are the two171

trap segments of the anticlinal formations on the eastern part of the basin. We see traces of HC against172

the wall of the closed faults, but no significant accumulation. Figure 3 shows paths and drainage areas,173

illustrating how the migration at the present time converges on the anticlines, while a minor part of HC174

migrates westwards toward Fault 2.175

The HC that migrate into the fault prospect are mainly lost during the time step of 1.77Ma - 1.55Ma176

(Figure 4), which is the critical time when the Muh seal is eroded. This particular uplift creates erosion,177

and losses can consequently be explained by the change in the geometry of the basin. The reservoir layer178

creates a small anticlinal trap structure against the fault where the HC accumulate. After the uplift the179

trap structure flattens out and the HC migrate out of the trap.180

Simulation of the Bezurk Basin

36

Accumulations and flow path

Figure 35. Hydrocarbon accumulation and flow path a) Mmd reservoir b) Ou reservoir

The results from the simulation show two accumulations in the Anticline prospect (Figure 35). The

information in Table 13 is extracted from PetroMod and shows that The Anticlinal prospect constitutes

almost 100% of the total resources in the basin. As expected the lower accumulation contains more

gas.

Oil (1e6 STB) Assoc. Gas (1e9 scf) Non. Assoc. Gas (1e9 scf) Condensate (1e6 STB) Anticlinal top segment (Mmd), acc. Nr. 25

866.5 236.49 0.12 0.01

Total Mmd reservoir 866.5 236.49 0.16 0.01 Anticlinal bot. segment (Ou), acc. Nr.38

149.97 151.26 92.03 2.18

Total Ou reservoir 149.98 151.44 95.64 2.19 Table 13. Accumulations in The Bezurk Basin

Figure 3: HC accumulations and flow paths for Mmd play (left) and Ou play (right) at present day. We see the oil (green) andgas (red) accumulations in the anticlinal segments, with traces of HC in the fault segment. The drainage area of the anticlinaltraps is much larger than the drainage area for the fault traps.

8

Page 9: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Simulation of the Bezurk Basin

37

None or very few hydrocarbons are trapped in the fault-prospect. The hydrocarbons that migrate into

the prospect are mainly lost during the time step of 1.77Ma – 1.55Ma (Figure 36), which is when the

Muh seal erodes (critical moment). This particular erosion creates uplift; as such the losses can be

explained by the change in the geometry of the basin. Figure 37a illustrates a close-up view of the

accumulation at 1.77Ma. The Mmd reservoir is shown as a transparent layer and the oil (green) and

gas (red) are also displayed. The reservoir layer creates a small anticlinal trap structure against the

fault where the hydrocarbons accumulate. After the uplift the trap structure flattens out (Figure 37b)

and the hydrocarbons migrate out of the trap.

Figure 36. HC acc., a) 1.77Ma b) 1.55Ma

Figure 37. HC acc. Close up, a) 1.77Ma, b)1.55Ma

Figure 4: Accumulation in the fault segments; screenshot of the process at 1.77 Ma and 1.55 Ma. Most of the HC leak outduring and after the uplift of the basin.

3. Basin modeling scenarios181

During the analysis of the basin we have been able to identify four critical elements that constitute182

possible sources of uncertainty in our model. In real life there are large uncertainty ranges in most of183

the input parameters, and previous studies such as Lerche (1997) and Wendebourg (2003) have discussed184

thoroughly the problem. Usually the modeller intends to constrain the model with the sparse measurements185

available, and leave more uncertainty for those parameters that can not be measured directly, such as the186

Heat Flow (HF), or that present a larger range, such as porosity or Total Organic Carbon (TOC) content.187

To accommodate for the uncertainty in our synthetic basin several scenarios for TOC content, HF and188

porosity are considered. We have also noticed that there is a zone in the western part characterized by a189

prominent faulting activity; for this reason we can hypothesize a possible structural uncertainty, by adding190

or removing one of these fault elements from our model. Usually the erosion magnitude is another common191

uncertainty factor; our choice to not include it depends mainly from the way the erosion is modelled in the192

Petromod maps, that makes it difficult to modify in a consistent way.193

We next run multiple-scenarios of BPSM changing the key factors in a controlled design of experiment194

(DOE).195

3.1. A full factorial design196

In order to study the interactions among the different factors, we have designed a full factorial study197

(Fisher, 1971), where each factor is represented by two to three levels. We have chosen three levels for the198

HF (HF): cool (50 mW/m2), normal (60 mW/m2) or hot (70 mW/m2); it is expected that a cool basin199

9

Page 10: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

mainly will stay in the oil window, consequently generating mostly oil, while a warm basin will reach the200

gas window at an earlier stage, and therefore generate more gas. We have further chosen two levels for the201

porosity of the reservoir rock, high or low (see profiles in Figure 5). We use two levels for the TOC content202

of both source rocks, with TOC ranging from 8% (high) to 4% (low) for the Mlf black shale and from 20%203

(high) to 10% (low) for the Eek coal. Finally, we select two levels, open or close, for the presence of a new204

fault (Fault 3) located east of Fault 2. Table 1 summarizes the results from the scenarios. From the master205

model (see Section 2.4), we observe that the HC which accumulated in The Fault Trap were lost during the206

time period of 1.77Ma to 1.55Ma. The reason for adding the Fault 3 is to see if this could trap HC and207

potentially create a prospect.208

Figure 5: Porosity profiles; on the left the high case, with initial porosity around 40 %; on the right the low case, with initialporosity around 30 % and a rapid decrease.

3.2. Simulation outcomes209

In each of the 24 different BPSM runs, we measure the size and type of HC accumulations. We further210

measure which source rock has generated them and we observe the migration path. We gain insight into the211

HC production, the expulsion from the source rock and the accumulation in the reservoirs. As a result, the212

amounts of HC that have leaked is available, and we can try to explain this leakage phenomenon through213

the observation of the complete evolution of the basin.214

In this section we discuss the main effects of the different scenarios. A more complete analysis is provided215

by analysis-of-variance printouts and diagrams in the supplementary material.216

The supplementary material also gives a Table containing the main data concerning generation, ex-217

pulsion, accumulation and leakage for each of the 24 scenarios. Data are in MMBOE (Million barrels oil218

10

Page 11: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Model Porosity HF Fault 3 TOC Acc TE Oil Acc TE gas Acc BE oil Acc BE gas1 high cool closed high 580 10.66 340 32.352 low cool closed high 247 3.29 90 10.413 high normal closed high 776 35.65 172 43.794 low normal closed high 220 10.33 35 6.285 high hot closed high 736 31.16 2 29.636 low hot closed high 212 9.17 1 9.027 high cool open high 537 9.43 343 31.598 low cool open high 247 3.15 91 10.239 high normal open high 773 35.49 167 44.3910 low normal open high 218 10.22 35 12.9511 high hot open high 731 32.67 5 33.4212 low hot open high 207 9.90 7 10.5813 high cool closed low 265 4.72 213 18.6114 low cool closed low 218 2.68 95 7.4715 high normal closed low 659 30.37 106 40.3616 low normal closed low 218 10.03 38 12.7517 high hot closed low 528 22.92 1 29.7618 low hot closed low 206 8.71 5 9.4319 high cool open low 265 4.72 213 18.6220 low cool open low 218 2.68 95 7.4721 high normal open low 661 32.08 84 38.8322 low normal open low 218 10.03 37 12.7523 high hot open low 527 24.12 3 33.2524 low hot open low 206 9.45 8 10.55

Table 1: Experimental table, full factorial design with 4 factors (Porosity, Heat Flow, Fault 3 and TOC) and 2 ∗ 3 ∗ 2 ∗ 2 = 24total levels. Accumulation results for the main anticlinal traps in MMBOE are reported. TE and BE refer respectively to theupper reservoir (Top East) and to the lower reservoir (Bottom East)

equivalent) throughout the whole analysis. Some results concerning important data are presented in Figure219

6. Here we depict six Pareto charts, showing which factors are more relevant in terms of variance decom-220

position, following the principles of a classical ANOVA analysis with multiple factors, see Cochran and Cox221

(1992). The variance (or the equivalent total sum of squares) of the response factor is subdivided into five222

components, four related to the factors under considerations and one related to the residuals. The cumula-223

tive sums of the first four components are shown in the charts. In this way, these charts allow immediate224

identification of the most relevant factors, i.e. of the factors that bear the highest contributions to the total225

variance. Similar conclusions can be drawn from the boxplots presented in the supplementary material.226

The generation phase is divided into oil and gas generation and further subdivided into the two source227

rocks that are responsible for the HC generation, respectively the Eek source rock and the Mlf source rock.228

The main factor driving the HC generation is the level of maturation of the source rock itself, that ultimately229

depends on the burial depth, the HF and the TOC. The analysis shows that higher HF allows an earlier230

and faster maturation and therefore a more abundant generation of gas in both the source rocks. For the231

11

Page 12: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Hea

t Flo

w

TOC

Por

osity

Faul

t 3

Generation Tot

Sum

of s

quar

es

0.0e

+00

5.0e

+07

1.0e

+08

1.5e

+08

020

4060

8010

0C

umul

ativ

e P

erce

ntag

e

TOC

Hea

t Flo

w

Por

osity

Faul

t 3

Generation Eek

Sum

of s

quar

es

0e+

002e

+06

4e+

066e

+06

020

4060

8010

0C

umul

ativ

e P

erce

ntag

e

Hea

t Flo

w

TOC

Por

osity

Faul

t 3

Expulsion Tot

Sum

of s

quar

es

0.0e

+00

5.0e

+07

1.0e

+08

1.5e

+08

020

4060

8010

0C

umul

ativ

e P

erce

ntag

e

Por

osity

TOC

Hea

t Flo

w

Faul

t 3

Accumulation Mmd

Sum

of s

quar

es

020

0000

6000

0010

0000

0

020

4060

8010

0C

umul

ativ

e P

erce

ntag

e

Hea

t Flo

w

Por

osity

TOC

Faul

t 3

Accumulation Ou

Sum

of s

quar

es

050

000

1000

0015

0000

020

4060

8010

0C

umul

ativ

e P

erce

ntag

e

Hea

t Flo

w

TOC

Faul

t 3

Por

osity

Outflow Side

Sum

of s

quar

es

0.0e

+00

1.0e

+07

2.0e

+07

020

4060

8010

0C

umul

ativ

e P

erce

ntag

e

Figure 6: Pareto charts concerning six data extracted from Petromod. TOC and Heat flow are relevant factors when measuringHC generation and expulsion. Porosity is relevant for accumulation. Fault 3 appears to be a relevant factor just when measuringthe outflow from the lateral side of the traps.

oil generation there are no significant differences in the impact of HF for the Eek source rock. This means232

that the oil generation has reached the maximum potential already when HF is on the medium level, and233

this is consistent with our hypothesis. It turns out that when HF is high, most of the oil generated by the234

deeper source rock leaks out before being trapped. Therefore, the overall effect is a smaller oil accumulation235

in the Ou reservoir when HF is high. HF and TOC are also the main parameters responsible for controlling236

the quantity of expelled HC, i.e. the amount of HC that leave the source rock after the generation.237

Regarding the size of accumulations, we can see (again referring to Figure 6 and to the supplementary238

material) that the main factor is the porosity, followed by the HF again, especially for what concerns the239

Ou accumulations. It is quite natural that the porosity is relevant, since a sandstone with good porosity can240

trap much more HC than a bad reservoir. It is interesting to notice how the effects of HF and TOC vanish,241

showing that the surplus of HC generated has almost totally been lost before the seal rock had reached its242

sealing capacity. Actually, we notice that the oil accumulations in the Ou reservoir decrease sensibly with243

12

Page 13: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

increased levels of HF, and are quite stable with respect to TOC.244

Finally, Fault 3 in the western part has a strong effect only when it is leaking. When the fault is not245

present, there is no leaking through the fault’s wall. On the contrary, when the fault is present, there is246

some leaking along the fault, especially when there is an early maturation (high HF). The fault has clearest247

effect when measuring the outflow from the side. In contrast, the outflow from the top and the total outflow248

is governed by the HF and TOC, since the scenarios with early maturation leak most of the HC before the249

seal is adequately sealing.250

We have run a similar analysis on the second major kinds of data that we get from a BPSM analysis,251

i.e. the oil and gas accumulations. In the supplementary materials we attach Table 2 with the expected252

accumulation values at surface conditions for each of the 24 scenarios. Some of these results can be also253

found directly in Table 1.254

We have distinguished four main accumulations, two in the eastern part of the basin, under anticlinal255

traps, and two in the western part of the basin, against the Fault 3. We name the first two accumulations256

as TE (Top East) and BE (Bottom East), and the latter as TW (Top West) and BW (Bottom West).257

TE and TW refer to the Mmd play, while BE and BW refer to the Ou play. The name Top refer to the258

upper reservoir, while the name Bottom refers to the lower reservoir. In the following sections these four259

accumulations will represent our four segments; TE and BE belong to the anticlinal prospect, while TW and260

BW belong to the fault prospect. The data confirm what is already observed through the previous analysis:261

oil accumulations in the Ou reservoir decrease with increasing HF, while gas accumulations increase from262

cool to normal HF, and then remain almost stationary. The main effect is the porosity of the reservoir rock,263

that accounts for the largest part of the total variability.264

4. Building the Bayesian network265

The experimental design setup gives better insight into the key factors responsible for the main geological266

processes in the basin. We will next use the multiple-scenario information to build a dependency structure267

for the segments. This takes the form of a BN that will be useful for decision making.268

A BN is characterized by a set of nodes and edges. The nodes are random variables, that may be

discrete or continuous. As an example, we will define nodes for trap presence (on/off), which is a binary

random variable. Edges define the conditional probability structure of the variables, connecting parents to

children. For instance, we will define a parent node for ’Trap Anticline’ that can be on/off. This node

has two children: ’TrapTopEast’ and ’TrapBottomEast’, which are also on/off, and they have conditional

13

Page 14: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

probability distributions depending on the outcome of the parent node. Let V be the set of all nodes, xv

the variable at node v ∈ V , and x the vector of all node variables, the joint probability model can then be

defined by

p(x|θ) =∏v∈V

p(xv|xpa(v), θv).

Here, pa(v) denotes the parents of node v. Further, θ denotes the set of model parameters required for the269

conditional probabilities tables (CPT), where θv is the local parameter for node xv. We show below how we270

can train or learn these parameter values from the multiple-scenario BPSM outputs.271

We have chosen to use a BN structure similar to that of Martinelli et al. (2011), and we have used272

software developed by Murphy (1999). The CPT are then parameterized by incorporating basic geological273

mechanisms and allowing for local failure in the propagation of HC elements, and we train parameters274

within this context. The formulation in Martinelli et al. (2011) appears to be a flexible way of modeling275

dependencies steming from different geological elements (trap, source, reservoir). The separate assignment276

of these elements gives a generic model specification that is easy to interpret and communicate. Finally,277

a BN formulation allows explicit evaluation of the changes in the probabilities when single elements are278

observed, which leads to what-if studies or VOI calculations.279

By using trap, source and reservoir elements in the BN, we thus avoid direct use of the factors involved280

in the DOE. We have seen that the HF for example interact both at source and at trap level, and that the281

porosity affects both the accumulation and the leaking phase. In order to capture the same effects you would282

need a quadratic regression, such that introduced in Wendebourg and Trabelsi (2005). The BN obviates283

such complex constructions.284

4.1. Learning the network285

The complete set of 24 scenarios, and associated observations, are shown in Figure 7 (generation) and286

in Figure 8 (accumulation). We have used a standard k-means (Kaufman and Rousseeuw, 2005) algorithm287

with k = 2 (accumulation) or k = 3 (generation) for assessing the threshold for categorizing the data.288

The optimal choice of the number of levels has been dictated by the algorithm itself (two clusters appear289

clearly in the accumulation figures, three or more in the generation figures). Other statistical methods for290

discriminant analysis could be useful in this case, for example methods based on quantile regression or other291

hierarchical or centroid-based clustering methods. Note that the data in this way become proxy for the292

knowledge about the geological elements, that could potentially be observed at segment level.293

We next consider the three main geological elements (trap, reservoir and source) separately. The BN294

14

Page 15: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

0 5 10 15 20 250

5000

10000

15000

Generation Total

0 5 10 15 20 250

2000

4000

6000

8000

10000

Generation Mlf

0 5 10 15 20 25500

1000

1500

2000

2500

3000

Generation Eek

0 5 10 15 20 250

200

400

600

800

1000

Generation Mlf Gas

0 5 10 15 20 250

2000

4000

6000

8000

10000

Generation Mlf Oil

0 5 10 15 20 250

200

400

600

800

Generation Eek Gas

0 5 10 15 20 25500

1000

1500

2000

Generation Eek Oil

Figure 7: Data for learning the source network. The x-axis represents the 24 experiments. Top: values for the HC generation.Middle: values for Eek and Mlf generation. Bottom: values for oil and gas generation in each of the Eek and Mlf source rock.Values in MMBOE.

model we have established is shown in Figure 9.295

• Trap: We have developed a network with 6 nodes: two parents, TrapAnticlinal and TrapFault, and296

four children, TrapTE, TrapBE, TrapTW and TrapBW. The marginals probabilities for the top nodes297

are {0, 1} for the anticlinal trap and {0.5, 0.5} for the fault trap. This is set by direct learning from298

the DOE output. The local CPTs for the children nodes include the possibility of a local failure,299

quantified in the success probability θT (∼0.9). This allows a strong and effective learning when the300

fault trap presence is confirmed or ignored.301

• Reservoir: The reservoir network is another small network, with 7 nodes: one common parent,302

that represents the total accumulation, two mid-level parents that represent respectively Mmd and303

Ou reservoirs and four children representing oil and gas accumulations in reservoirs (ResMmdGas,304

ResMmdOil, ResOuGas and ResOuOil). The nodes can be efficiently learned through a simple BN305

algorithm, given the data provided by the BPSM simulations.306

The learning process follows the classical maximum likelihood procedure with complete data using the

15

Page 16: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

0 5 10 15 20 250

500

1000

1500

Accumulation Total

0 5 10 15 20 25200

400

600

800

1000

Accumulation Mmd

0 5 10 15 20 250

200

400

600

Accumulation Ou

0 10 200

20

40

60

Accumulation Mmd Gas

0 10 20200

400

600

800

1000

Accumulation Mmd Oil

0 10 200

20

40

60

Accumulation Ou Gas

0 10 200

100

200

300

400

Accumulation Ou Oil

Figure 8: Data for learning reservoir network. The x-axis represents the 24 experiments. Top: values for the accumulated HC.Middle: values for Mmd and Ou generation. Bottom: values for oil and gas generation in each of the Mmd and Ou sourcerock. Values in MMBOE.

joint model (Cowell et al., 2007). Because of conditional independence, and the database output from

the DOE, we can maximize each term separately. This means that we can restrict our attention to the

term we are interested in, locally, and then we can find the maximum likelihood estimate θ̂v locally,

based on our database:

θ̂vxv=

#(xv ∧ xpa(v))#(xpa(v))

,

i.e. the CPTs are estimated by the ratio of the corresponding counts in the database. For the top307

nodes, we just count the fractions directly. In our case we update on the basis of a limited number308

of experiments. Different and possibly more complex prior distributions can be assigned to the top309

nodes, such as Dirichlet priors. When there are missing or incomplete data more refined techniques310

are suggested, such as Expectation-Maximization (EM) or penalized EM algorithms, see e.g. Jordan311

(1998) and Cowell et al. (2007).312

In our example all reservoir nodes are binary (two states, high and low), and we impose a threshold for313

16

Page 17: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

the accumulations being larger than a certain value. These values are the separating planes indicated314

by the k-mean algorithm. (Figure 8).315

With this procedure we can derive explicitly the correlation between the different nodes. Note that we316

have not imposed these correlation, but derived them from the data. They are nonetheless possible to317

tune, if the values are in contradiction with expert belief or other sources of data. For more details we318

refer to Appendix A. Here we just provide an example. For the reservoir subnetwork just described,319

we can check the correlation between some of the bottom nodes: the correlation between the node320

ResMmdGas and ResMmdOil is 0.84, while the correlation between ResOuOil and ResMmdOil is321

0.32. The first result is natural since a good porosity of the reservoir rock increases its ability to hold322

both oil and gas. The second result tells us that the porosity in the Ou reservoir and in the Mmd323

reservoir are weakly but positively correlated. This is the effect of having the same sandstone in both324

reservoir rocks, and of the choice of changing the porosity in tandem in the two reservoir rocks (either325

poor-poor or good-good). A confirmation that our discretisation has not altered too dramatically the326

results come from the comparison of the correlation between the distributions computed on the BN327

and the correlation of the data series. We note that Accumulation Mmd Gas and Accumulation Mmd328

Oil have a correlation coefficient of 0.90, while Accumulation Mmd Oil and Accumulation Ou Oil have329

a correlation coefficient of 0.25. These results are covered extremely well by the distributions of the330

network. Finally, we report in Table 2 the conditional probabilities of the given variables (note that331

they are not in a parent-child relationship!), derived from our BN; the marginals for the state high332

for the variables ResMmdGas ResOuOil are respectively 0.33 and 0.25. The Bayesian computations333

behind these and other results are discussed in detail in the Appendix A.334

ResMmdGas / ResMmdOil low highlow 1 0

high 0.2 0.8

ResOuOil / ResMmdOil low highlow 0.867 0.133

high 0.575 0.425

Table 2: Conditional Probability Tables for the variables ResMmdGas vs ResMmdOil (left) and ResOuOil vs ResMmdOil(right), within the Reservoir subnetwork.

• Source: The source network is more complicated, since we have to take into account two phenomena335

that interact with a difficult correlation structure, namely one for the gas generation and one for the336

oil generation. As we have previously discussed, for the shallower top rock an increase in HF has the337

duplex effect of a higher oil and gas generation. On the other hand, for the deeper source rock, it affects338

just the gas generation. TOC affects both generation in similar ways. We learn the statistical effect339

17

Page 18: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

of this behavior from the DOE outputs concerning the generation phase. We include a correlation340

structure with 3 levels: a top node for the total generation, intermediate nodes for the Mmd and Eek341

generation and bottom nodes for the gas and oil generation in each of the source rocks.342

In this case we have assigned three levels to all the nodes, respectively high generation, medium and343

low. Most of the CPTs are learned directly from the data, using GenTot, GenMlf, GenEek, GenMlfOil,344

GenMlfGas, GenEekGas and GenEekOil along with the thresholds discretizing the data. All the nodes345

are discrete, with three possible stases, i.e. k = 3 in Figure 7.346

Finally, we gather the information that we get from source, reservoir and trap in a single node, using347

our geological understanding of the process. We know that the source rock is essential for the presence of348

HC in the prospect, while a poor reservoir quality or a poor trap makes it less likely to have a commercial349

discovery in the prospect. We will next discuss other considerations for joining the last part of the network.350

4.2. Gaussian nodes351

So far in the BN building, we have not used the accumulation volumes extracted from the multiple-

scenario BPSM. For learning the reservoir network we have used joint layer accumulation values and not

prospect/segment values. We will now incorporate this information in the bottom nodes of the network.

It seems reasonable to have discrete nodes in the top parts of the network, since attributes such as source,

reservoir and trap are on/off or multi-level features. In the bottom part of the network it may be more

realistic to have continuous nodes that mimic the actual behavior of the simulated scenarios. We therefore

split each of the bottom nodes TE, BE, TW and BW in two nodes, one for gas volume and the other for

oil volume, and state that they represent accumulation distributions whose mean and (possibly) variance

depend on the states of their parents. The simultaneous use of discrete and continuous variables in BN has

been explored in Chang and Fung (1995) and Friedman and Goldszmidt (1996). A good inference algorithm

is presented in Murphy (1999); the algorithm used is the Junction Tree Algorithm presented in Cowell et al.

(2007). The related CPTs have to be assessed, for example the conditional probability density of BEg (BE

gas) is:

pBEg(x|TraBE , ResOuGas, SouEekGas) ∼ N(µBEg, σ2BEg),

where µBEg is the conditional mean value and σBEg is the conditional standard deviation of this Gaussian352

distribution. This means assessing 12 mean and variance parameters (2 states for Trap and Reservoir and 3353

for Source) for each of the 8 nodes. We use the accumulations from our experimental design as references for354

the mean values of our Gaussian distributions. The choice of using Gaussian nodes comes from their wide355

18

Page 19: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

use as continuous nodes when dealing with BN, particularly because of their simple parameter estimation356

properties via ML. Furthermore, it is reasonable to believe that given all the input parameters fixed, the357

accumulation distribution will be Gaussian. This is not in contrast with the classical lognormally shaped358

distribution of the total reserves, since the lognormal shape is due to the contribution of several uncertainty359

factors, while our nodes assume that for each state the set of parameters is fixed. Marginalizing over the360

discrete top nodes, we get a mixture of Gaussian distributions which produces results consistent with the361

classical theory.362

ResTop

ResMmd

ResOu

ResMmdGas

ResMmdOil

ResOuGas

ResOuOil

TraFaultTraAnti

TraTE TraBE TraTW TraBW

SouTopSouMlf

SouEek

SouMlfGas

SouMlfOil

SouEekGas

SouEekOil

TEg

BEg

TWg

BWg

TEo

BEo

TWo

BWo

Figure 9: BN with trap(top), reservoir(left) and source(lower right) branches. Top nodes are all discrete, while bottom nodesare Gaussian, as explained in Section 4.2.

Further, we include the possibility of local failure of one element, with a reduced volume, according to363

Table 3. Let the parameters γR and γS be the local importance factors for the elements source and trap. The364

effect is that the failure of a single element can still produce minor accumulations. This occurs for instance if365

we believe that low/high states for factors like porosity do not totally preclude the accumulation of HC, but366

simply produces a sensible reduction in the quantity (as seen in the simulations), due to unpredictable local367

variations as a consequence of porosity reducing or enhancing effects. The choice of the parametrization for368

Table 3 has some immediate effects: we are implicitly assuming, for example, that if we find a volume equal369

19

Page 20: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

to 0 in a segment and we know that a trap is in place, we have to blame the source for this lack of volume370

(rows 3 and 4 in the table), but the same situation can also occur when both trap and reservoir fail (rows 5371

and 9), no matter which is the outcome of the source, as it is natural to assume. When just one of these two372

elements fail, on the other side, we still allow a marginal possibility of finding HC, and this is resumed in the373

parameters γR and γS . We have fixed γR and γS to be equal to 0.2. The effects are multiplicative, therefore374

a factor 0.2 reduces the expected accumulations to 20% of the expected accumulation with all the elements375

in place. The choice of a factor 0.2 is due to the need to cover approximately the accumulations from our376

experimental design. This parameter could possibly, in larger studies, be learned directly from data as well.377

The numbers 1, 2 and 3 in Table 3 represent the different states of the nodes. For the trap node, state 1378

correspond to the failure state (trap not present), while state 2 correspond to the success case (trap present).379

For the source node, state 1 corresponds to the failure case (charge not present), state 2 corresponds to the380

intermediate case (weak charge), and state 3 corresponds to the success case (strong charge).381

The second important point to discuss is how to assign the variances to the Gaussian distributions. We382

acknowledge that this is a crucial point, with large and important implications when analyzing the effect383

of nodes’ behaviour, as shown in the previous paragraph. We have decided to assign the variances in order384

to have a constant coefficient of variation σµ in all the possible scenarios described in Table 3; a constant385

coefficient of variation will be our standard hypothesis for the variability of HC volumes. We need to stress,386

though, that we do not have a definitive answer or suggestion to this point, since we will never be able387

to consider and describe all the possible scenarios that could possibly happen in the basin that we are388

considering.389

The complete BN is shown in Figure 9. Since the accumulations cannot be negative, we will concentrate390

in 0 the probability mass corresponding to negative values. Such negative values are the consequence of the391

Gaussian nodes chosen for the accumulations with a set of fixed parameters. The resulting distributions will392

therefore be a mixture of truncated Gaussian distributions.393

The effects of this parametrization on the HC distributions can be seen in Figure 10. The distributions394

are truncated at 0, resulting in mixed discrete-continuous distributions. The probabilities of discovery or395

chance of success (COS) are 0.919 for Top Anticlinal and 0.797 for Bottom anticlinal. Since our approach396

is completely data-driven in this case, we cannot compare it directly with a classical approach based on the397

multiplication of several risk factors, such as COS=P(trap)*P(reservoir)*P(source). Note, though, that BN398

can be efficiently used also in that setting, as shown in Martinelli et al. (2011) and Martinelli et al. (2012).399

The distributions are multimodal, and the different modes reflect the likelihood of being in each of the400

20

Page 21: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Reservoir Trap Source µ1 1 1 02 1 1 01 2 1 02 2 1 01 1 2 02 1 2 γR1 2 2 γS2 2 2 γR + γS1 1 3 02 1 3 2 ∗ γR1 2 3 2 ∗ γS2 2 3 1

Table 3: Conditional Probability Table for the oil and gas accumulations in the four prospects; the column µ represents themultiplicative factor assigned to the mean of the gaussian conditional distribution. The numbers 1, 2 and 3 in Table 3 representthe different states of the nodes. For the trap node, state 1 correspond to the failure state (trap not present), while state 2corresponds to the success case (trap present). For the source node, state 1 corresponds to the failure case (charge not present),state 2 correspond to the intermediate case (weak charge), and state 3 corresponds to the success case (strong charge).

24 configurations taken into account. The comparison between the empirical distribution (24 configurations,401

shown with blue stars) and the BN distribution can be found in Figure 11. Here the bivariate distributions402

for the states oil and gas for the main TE (left) and BE (right) accumulations are shown. As we can see, there403

is a positive correlation between the oil and gas accumulations, due to the positive effect of TOC and HF in404

the maturation of the source rock. Second, the BN distribution covers quite well the empirical distribution,405

though there are discrepancies due to the thresholds introduced in Section 4.1 and the prior values (again406

learned from the data) imposed to the upper nodes of the network. Recall that the main goal of this work407

is not to reproduce exactly the BPSM behaviour, but to integrate the results in a probabilistic framework408

where it is easier to evaluate the effect of particular observables. Nonetheless, since we have considered409

quite extreme settings in our parameter space, we have good reasons to believe that our distributions would410

constitute an ideal contour line (envelope) of a much larger range of scenarios than our original 24, and411

therefore would capture most of the uncertainties that characterize the basin modelling behaviour of this412

case study.413

For economical evaluation purposes it is interesting to analyze the inverse cumulative distributions of414

recoverable HC. In order to compute such distributions we need to take into account the recovery factor,415

that is estimated to be 0.45 for oil accumulations and 0.75 for gas accumulations. In Figure 12 we show416

the inverse cumulative distributions for segments TE and BE of the anticlinal prospect. The black line417

represents the contribution of the oil part, while the red line represents the added value brought by the gas418

accumulation. As we can see the gas accumulation is more important for prospect BE since this has a source419

21

Page 22: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

−30 −20 −10 0 10 20 30 40 50 60 700

0.01

0.02

0.03

0.04

0.05

0.06

Volume (MMBOE)

Distribution Gas Ou (BE)

Distribution Gas Mmd (TE)

−200 0 200 400 600 800 10000

0.002

0.004

0.006

0.008

0.01

Volume (MMBOE)

Distribution Oil Ou (BE)

Distribution Oil Mmd (TE)

Figure 10: Oil and gas volume distributions in prospects BE and TE. The multimodality of the distribution is due to failureof local geological elements that do not totally jeopardize the likelihood of finding HC.

rock (Eek) maturity level sufficient to produce commercial quantities of gas.420

These distributions can be updated when more information gets available. Let us focus our attention on421

the effect of added information on the gas accumulation relative to segment BE of the anticlinal prospect.422

Let us assume that we receive information that confirms the presence/absence of the reservoir or trapping423

condition in that prospect. The network is updated, and the conditional accumulation distributions can be424

seen in Figure 13. The effect of confirming an adequate reservoir layer is much stronger than a positive trap,425

since the prior for the anticlinal trapping to be adequate is already as large as 0.9, while the uncertainty426

about the quality of the reservoir layer (porosity) is much larger.427

5. Applications for decision making428

In this section we demonstrate a couple of different applications of the network.429

22

Page 23: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Volume Gas (MMBOE)

Vo

lum

e O

il (M

MB

OE

)

−20 0 20 40 60−200

0

200

400

600

800

1000

1

2

3

4

5

6

7

x 10−5

Volume Gas (MMBOE)

Vo

lum

e O

il (M

MB

OE

)

−20 0 20 40 60−100

−50

0

50

100

150

200

250

300

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

x 10−4

Figure 11: Oil and gas volumes joint bivariate distributions. Values are given for the accumulations TE (left) and BE (right).

5.1. What-if scenarios430

We are interested in the behavior of the network in case of observing a HC column in another prospect.431

In order to mimic a real situation, we consider drilling a well on the anticlinal prospects, and observe the432

impact of various evidence in TE, the top segment, on BE, the bottom segment (Figure 14). We then433

compare with a similar observations made on the fault prospect BW (Figure 15).434

In Figure 14 we see that even a rich observation in TE is not sufficient to solve the bi-modality of the435

marginal distribution, since the possible uncertainty about the quality of the reservoir remains (TE and BE436

belong to 2 different reservoirs). In Figure 15 we see that both an extremely poor and a rich observation437

in the fault prospect BW can substantially change the shape of the posterior oil BE distribution as both438

segments belong to the same play. As we have already pointed out, a positive HC column observation in a439

high risk prospect such as BW that confirms for the play both the quality of the reservoir and the existence440

of a charge has a higher impact on BE than an observation in TE belonging to a different play.441

5.2. Value of Information442

The Value of Information (VoI) and the Value of Perfect Information (VoPI) are indices that are becoming443

popular in the industry for evaluating the economical convenience of acquiring a certain set of data or drilling444

an exploration well, see Eidsvik et al. (2008) and Bhattacharjya et al. (2010) for details. In this study we445

will consider just the VoPI, meaning that we consider the possibility of getting perfect information (oil/gas446

vs dry) from an exploration well.447

23

Page 24: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

0 50 100 150 200 250 300 350 400 450 5000

0.2

0.4

0.6

0.8

1Inverse cumulative distribution of recoverable resources, Anticlinal Mmd (TE)

Volume, MMBOE

Oil resources

Oil+gas resources

0 50 100 150 200 250 3000

0.2

0.4

0.6

0.8

1Inverse cumulative distribution of recoverable resources, Anticlinal Mmd (BE)

Volume, MMBOE

Oil resources

Oil+gas resources

Figure 12: Inverse Cumulative Distribution of recoverable resources for segments TE (Top) and BE (Bottom) of the anticlinalprospect. In black volumes related to the oil accumulations, in red volumes composed of the joint contribution of oil and gasaccumulations.

Geologists and decision makers need to establish the probability of discovering recoverable HC larger448

than an economic threshold. We can use a similar criterion for assessing the VoPI, saying that a value for449

expected resources falling below the economic threshold is equivalent to having no discovered resources at all.450

Furthermore, when we compute the VoPI we have always to specify the cost of collecting that information.451

In this case since our reference unit are the volumes expressed in MMBOE, we will express the costs in the452

same units.453

Given these premises, the prior value for threshold t and cost C would be:

PV =∑

j∈{BE,TE,BW,TW}

max

{∫x>t

P (xj = x) · vx dx− C, 0}

24

Page 25: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

6x 10

−3

Volume (MMBOE)

marginal BE oil

marginal BE oil | Res BE OK

marginal BE oil | Tra BE OK

Figure 13: Distribution of the oil accumulation in segment BE before and after observing positive Reservoir and Trap evidencein the same segment. A positive evidence about the reservoir quality will not just increase the COS but also affect the expectedvolume, as we can see from the dashed line. A positive evidence about the trap will have a much smaller impact, since its priorvalue is already close to 1.

0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

6

7

8x 10

−3

Volume (MMBOE)

marginal BE oil

marginal BE oil | TE oil Acc = 0

marginal BE oil | TE oil Acc = 350

marginal BE oil | TE oil Acc = 700

Figure 14: Distribution of the oil accumulation in segment BE before and after observing an oil column of different heightin segment TE. The evidence collected in segment TE efficiently propagates through the network and has an impact on theexpected volume distribution for prospect BE; the larger is the discovery, the bigger is the impact.

The value of having free clairvoyance in segment i would then be:

V FC(i) =

∫ ∑j∈{BE,TE,BW,TW}

max

{∫x>t

P (xj = x|xi = e) · vx dx− C, 0}P (xi = e) de.

25

Page 26: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

0 50 100 150 200 250 300 350 400 450 5000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

marginal BE oil

marginal BE oil | BW oil Acc = 0

marginal BE oil | BW oil Acc = 5

marginal BE oil | BW oil Acc = 10

Figure 15: Distribution of the oil accumulation in segment BE before and after observing an oil column of different heightin segment BW. The evidence collected in segment BW efficiently propagates through the network and has an impact on theexpected volume distribution for prospect BE; the larger is the discovery, the bigger is the impact. The impact is overall largerthan that shown in Figure 14, since a relevant oil accumulation in segment BW is extremely unlikely.

and finally:

V oPI(i) = V FC(i)− PV.

In these expressions the quantity vx is intended to be proportional to the recoverable resources. It is worth454

noticing that when i = j the integral collapses in a single point, and we observe what is called self-evidence,455

i.e. the effect of observing a prospect itself.456

Since the distribution are numerically approximated, the integrals are computed through a discretization457

and this makes the process computationally intensive.458

When computing the VoPI, we state that a certain prospect will be drilled if its expected recoverable459

resources exceeds a certain threshold. We have considered two possible scenarios for t, t = 0 and t = 80. The460

value t may also represent risk averse behavior for the decision maker: the higher is t, the more conservative461

is the decision maker. For each possible scenario we have computed the VoPI for the four prospects for462

different costs C, representing the operational cost connected to developing the prospect. We have decided463

not to introduce monetary units, but to refer everything in MMBOE, that is the reference unit for the464

prospects’ volumes; for this reason C is expressed in the same terms. We have repeated the procedure with465

and without the self evidence.466

Results are in Figures 16 and 17. We can immediately see two major spikes, corresponding to the range467

of costs that affects decisions in the biggest prospects, namely BE and TE. This means that for operation468

26

Page 27: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

0 50 100 150 2000

10

20

30

40

50

VOPI, t=0

Cost C

VOPI TE

VOPI BE

VOPI TW

VOPI BW

0 50 100 150 2000

10

20

30

40

50

Cost C

VOPI, t=80

VOPI TE

VOPI BE

VOPI TW

VOPI BW

Figure 16: Value of Perfect Information for the four prospects BE, TE, BW and TW, as a function of the threshold t andof the project/operation costs C. The two major spikes correspond to the range of costs that affects decisions in the biggestprospects, namely BE and TE.

0 50 100 150 2000

10

20

30

40

50

Cost C

VOPI without Self−Evidence, t=0

VOPI TE

VOPI BE

VOPI TW

VOPI BW

0 50 100 150 2000

10

20

30

40

50

VOPI without Self−Evidence, t=80

Cost C

VOPI TE

VOPI BE

VOPI TW

VOPI BW

Figure 17: Value of Perfect Information without self evidence for the four prospects BE, TE, BW and TW, as a function of thethreshold t and of the project/operation costs C. The two major spikes correspond to the range of costs that affects decisionsin the biggest prospects, namely BE and TE.

costs in the regions close to the spikes, having the possibility of observing the state of one of the prospects469

would sensibly change our decision about the other prospects. We recognize that the first spike corresponds470

to a decision change in prospect BE, and the second spike to a change in prospect TE. This is confirmed471

both by the VoPI computed without self evidence (the spikes corresponding to self evidence disappear, see472

for example the dashed line of TE that goes to 0 in Figure 17 for high values of C), and by observing which473

prospects have the highest impacts: BW in case of the first spike and TW in case of the second spike. The474

geological reasons have been discussed when commenting Figures 14 and 15 (the effect of confirming an475

adequate reservoir layer given by an observation in the fault prospect is stronger than that of an adequate476

trap, since from our data the prior likelihood for the anticlinal trap is much larger than the prior likelihood477

27

Page 28: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

of a good reservoir quality), and they are confirmed by this VoPI analysis, which compresses the information478

into outputs useful for decision making. Similar discussions and considerations can be found in Martinelli479

et al. (2011).480

The values of VoPI that we get from such analysis must be compared with the exploration cost necessary481

to get that information, again expressed in MMBOE. As we can see, VoPI values are much smaller than the482

operation costs C, and this is consistent since they need to be compared to the exploration costs and not to483

the operation costs. If the exploration cost is, say, equal to 10 MMBOE, the threshold is fixed to t = 0, and484

the development costs C are equal to 50, it is optimal (more informative) to focus on segment BE; in this485

case TE is not very informative since its high volume makes it profitable anyway. If the costs C are equal486

to 150, on the other side, it is more informative to explore TE; in this situation even TW could be a good487

candidate (its VoPI lies above 10 MMBOE for C = 150), while TE becomes irrelevant since its volume does488

not cover the operation costs and it is less informative than TW for estimating the outcome of BE. The489

last comment is about the threshold t: we can see that higher values of t lead to a shift towards smaller490

costs C for the VoPI peaks. This is reasonable since higher values of t reduce the chance of the prospects to491

be commercially viable, and therefore make them interesting just if the operational costs are smaller. This492

effect is bigger for prospects with low volumes (first and foremost TW and BW, but also BE), while it is493

almost impossible to detect when the volume is large (TE), since the imposed threshold makes this prospect494

very appealing in any situation.495

6. Guidelines for practical use496

The example discussed in this paper is simplified under several points of view with respect to a real world497

scenario. For this reason, we would like to point out here some indications that should guide a practical498

application of this methodology.499

• How reliable is a process based simply on a basin model?500

In the present study we rely completely on a single basin model, whose parameters can change, but501

whose structure is essentially fixed. This means that we can correct for eventual discrepancies in many502

geometrical or geological parameters, but we are assuming that all the relevant information come from503

this unique source. This is clearly a simplification due to the necessity of presenting the workflow, but504

we believe that improvements are possible. The first important point is that the approach presented505

here does not include the experts’ knowledge as source for driving the probabilities of the risk factors.506

28

Page 29: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

This is relevant since expert knowledge is commonly used in the industry for quantifying the risk507

factors of the different geological elements. An idea for integrating it is to build this knowledge into508

the network’s structure, as proposed in Martinelli et al. (2012). Another idea is to integrate it in a509

Dirichlet prior over the networks’ parameters, that is subsequently updated with the results of the510

experiments. The second point is whether, together with expert opinion, other data-driven sources of511

information can be used in parallel with Basin modeling to improve the estimate of the segments’ and512

prospects’ COS. To this regard, there are recent contributions that aim to use directly seismic or EM513

data to update the segments’ chances of success, see for example Kolbjornsen et al. (2012).514

• Is an experimental design the ideal way to train the BN?515

The experimental design plan is a simple yet complete way of handling the problem. If the extreme516

points of the design plan are able to capture the extremes of the distributions, and if these distributions517

are reasonably behaved (not too skewed towards one of the extremes), the method is robust. The518

problem is that in real life we often work with skewed distributions, possibly even multi-modal. In this519

case a simple DOE approach such the one proposed here is not sufficient any longer, and more complex520

approaches such as those based on Response Surface Models (RSM) presented in Wendebourg (2003)521

should be considered. The bottleneck in using this approach would be the discretization procedure:522

to put it simple, it is useless to be able to explore in detail the sample space, if afterwards we have to523

summarize our results into a few discrete outcomes. More complex BN with continuous nodes should524

probably be taken into account in this case.525

• Is it consistent to modify the parameters one at a time, disregarding the possible interdependencies?526

Even considering that the distributions are not skewed and that a DOE approach is a reasonable way527

to integrate the uncertainty, there may be problems due to the incompatibility of some configurations528

in the likely case where the input parameters are correlated. In this case we should ideally build first529

the joint distribution over the input parameters, and then draw configurations from that distribution.530

Our suggestion remains to apply Occam razor when possible, and avoid unnecessary constructions531

that make the final result more difficult to read and to interpret. A nice aspect of our approach is that532

we immediately control which factors are more to blame for certain results, and more complex design533

tables would jeopardize this clarity and effectiveness.534

• How to handle a more complex scenario with the same approach? Which are the limits?535

As said before, in a real scenario the uncertainty range is usually wider (more parameters involved) and536

29

Page 30: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

more complex (correlations involved). Therefore great care should be taken when trying to reproduce537

a similar analysis. As mentioned above, we believe that this method is attractive because it allows538

an explicit evaluation of what-if scenarios and it allows to draw decisions and conclusions based on539

a sound and consistent framework where every assumption is made explicit. We do not believe that540

the computational complexity is an issue when dealing with real case studies, not even with many541

uncertainty parameters. To this regard, other BM software simpler than Petromod could be used, see542

for example Sylta (2004). We believe that the main challenges when dealing with real case studies543

are the parametrization of the network and of the BN distributions involved and the discretization544

thresholds.545

7. Discussion and Conclusions546

We have shown how Basin and Petroleum System Models can help in assessing the probability structure547

of the Bayesian Network that models prospect and play element dependencies. The workflow moves from548

the Earth model to the decision space. The geological and geophysical know-how is translated into BPSM.549

Outputs from multiple runs of basin modeling under different geologic scenarios are then used to establish550

the Bayesian network which is used to test decision scenarios, and perform value of information analysis.551

The work underlines the importance of assessing uncertainty in petroleum systems. The emphasis is552

less on knowing the right answer, that may never be known before drilling, but rather on determining the553

range of outcomes given the available data and state of understanding of the petroleum system. Problems554

are caused by the complex and often non linear interactions among the different parameters, that make the555

prediction problem extremely difficult. Currently these problems are solved running a bunch of simulations556

with different parameters, and studying the uncertainty in the resulting accumulated volumes distribution557

as main or sole output. We believe that this process is not sufficient any longer, since there are too many558

parameters that remain hidden (implicit parameters) when the effect of many parameters is tested at the559

same time. With our framework we provide an alternative solution by making explicit all the interconnected560

parameters, though not chosen arbitrarily, but derived from a multiple scenario evaluation.561

Acknowledgments562

We thank the Statistics for Innovation (SFI2) research center in Oslo, that partially financed GMs563

scholarship through the FindOil project. We acknowledge the Stanford BPSM group for the opportunity564

given to GM of learning and practicing the software used in this work.565

30

Page 31: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

References566

Allen, P., Allen, J., 2005. Basin Analysis, Principles and Applications. 2th ed. Blackwell Publishings.567

Bhattacharjya, D., Eidsvik, J., Mukerji, T., 2010. The value of information in spatial decision making. Mathematical Geosciences568

42 (2), 141–163.569

Chang, K., Fung, R., 1995. Symbolic probabilistic inference with both discrete and continuous variables. IEEE Transactions570

on Systems, Man and Cybernetics 25 (6), 910–917.571

Cochran, W. G., Cox, G. M., 1992. Experimental Designs. Wiley.572

Corre, B., Thore, P., deFeraudy, V., Vincent, G., 2000. Integrated uncertainty assessment for project evaluation and risk573

analysis. SPE European Petroleum Conference.574

Cowell, R., Dawid, P., Lauritzen, S., Spiegelhalter, D., 2007. Probabilistic Networks and Expert Systems. Springer series in575

Information Science and Statistics.576

Damsleth, E., Hage, A., Volden, R., 1992. Maximum information at minimum cost: A north sea field development study with577

an experimental design. Journal of Petroleum Technology 44 (12), 1350–1356.578

Dejean, J.-P., Blanc, G., 1999. Managing uncertainties on production predictions using integrated statistical methods. SPE579

Annual Technical Conference and Exhibition.580

Eidsvik, J., Bhattacharjya, D., Mukerji, T., 2008. Value of information of seismic amplitude and csem resistivity. Geophysics581

73 (4), R59–R69.582

Fisher, R., 1971. The Design of Experiments, 9th Edition. Macmillan.583

Friedman, N., Goldszmidt, M., 1996. Discretizing continuous attributes while learning bn. Machine Learning: Proceedings of584

the International Conference.585

Hantschel, T., Kauerauf, A. I., 2009. Fundamentals of Basin and Petroleum Systems Modeling. Springer.586

Jordan, M., 1998. Learning in graphical models. Kluwer Academic Publishers.587

Kaufman, G. M., Lee, P. J., 1992. Are wildcat well outcomes dependent or independent? Working papers 3373-92., Mas-588

sachusetts Institute of Technology (MIT), Sloan School of Management.589

Kaufman, L., Rousseeuw, P., 2005. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability590

and Statistics.591

Kolbjornsen, O., Hauge, R., Drange-Espeland, M., Buland, A., 2012. Model-based fluid factor for controlled source electro-592

magnetic data. Geophysics 77 (1), E21–E31.593

Lerche, I., 1997. Geological risk and uncertainty in oil exploration. Academic Press.594

Martinelli, G., Eidsvik, J., Hauge, R., Drange-Forland, M., 2011. Bayesian networks for prospect analysis in the north sea.595

AAPG Bulletin 95 (8), 1423–1442.596

Martinelli, G., Eidsvik, J., Hauge, R., Hokstad, K., 2012. Strategies for petroleum exploration based on bayesian networks:597

a case study, spe paper 159722. SPE Annual Technical Conference and Exhibition, San Antonio, TX, USA, 8-10 October598

2012.599

Murphy, K. P., 1999. A variational approximation for bayesian networks with discrete and continuous latent variables. Pro-600

ceedings of the Fifteenth conference on Uncertainty in artificial intelligence, UAI ’99.601

Sylta, O., 2004. Hydrocarbon migration modelling and exploration risk. Dr. philos, NTNU.602

Tviberg, S., 2011. To assess the petroleum net present value and accumulation process in a controlled petromod environment.603

Master Thesis at the Department of Geology and Mineral resources engineering, NTNU.604

31

Page 32: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

VanWees, J., Mijnlieff, H., Lutgert, J., Breunese, J., Bos, C., Rosenkranz, P., Neele, F., 2008. A bayesian belief network605

approach for assessing the impact of exploration prospect interdependency: An application to predict gas discoveries in the606

netherlands. AAPG Bulletin 92 (10), 1315–1336.607

Wendebourg, J., 2003. Uncertainty of petroleum generation using methods of experimental design and response surface mod-608

eling: Application to the gippsland basin, australia. In: AAPG/Datapages Discovery Series No. 7: Multidimensional Basin609

Modeling, Chapter 19. AAPG Special Volumes, pp. 295?–307.610

Wendebourg, J., Trabelsi, K., 2005. How wrong can it be? understanding uncertainty in petroleum systems modelling. In:611

Geological Society, London, Petroleum Geology Conference series. Vol. 6. Geological Society of London, pp. 1289–1299.612

Appendix A. Basic computations on Bayesian Networks613

We discuss here in detail the learning procedure of the Reservoir part of the BN presented in Figure 9,614

and the relative computations.615

We use seven series of data provided by our 24 basin modelling scenarios. The data are respectively616

Accumulation Total, Accumulation Mmd, Accumulation Ou, Accumulation Mmd Gas, Accumulation Mmd617

Oil, Accumulation Ou Gas and Accumulation Ou Oil.618

Most of these data are reported in Table 1, in the supplementary materials, and shown in Figure 8. Given619

these data we build a network with seven nodes, whose names are respectively ResTop, ResMmd, ResOu,620

ResMmdGas, ResMmdOil, ResOuGas, ResOuOil. The structure of the network is imposed, and it can be621

seen on the left side of Figure 9. It is made by a top node, ResTop, with two children, ResMmd and ResOu,622

each of them has again two children, ResMmdGas and ResMmdOil for the first one, and ResOuGas and623

ResOuOil for the second one. The distributions are learned directly from the data. We do not incorporate624

any prior opinion. This means that the learning process is based just on the counts of the successful cases.625

We show the discretised values for the seven nodes of interest in Table A.4.626

Let us consider as an example the first two nodes, ResTop and ResMmd. We see that whenever the627

node ResTop is in state 2 (high), the node ResMmd is in state 2 as well. We also notice that just once we628

observe ResTop in state 1 (low) and node ResMmd in state 2 (high). This happens in the 23rd scenario.629

This means that we have 93% probability (14 times out of 15) to observe the node ResMmd in state 1 when630

we observe the node ResTop in state 1 as well. From these considerations we can write the conditional631

probability distribution of node ResMmd given node ResTop (Table A.5).632

We can use the same procedure for all the nodes of this subnetwork, and learn all the CPTs that we633

need. In this way we build the joint distribution for the network. We report the marginal distributions for634

the seven nodes of the network in Table A.6. The complete joint distribution representation would require635

32

Page 33: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

Scenario / Node ResTop ResMmd ResOu ResMmdGas ResMmdOil ResOuGas ResOuOil1 2 2 2 1 2 2 22 1 1 1 1 1 1 13 2 2 2 2 2 2 24 1 1 1 1 1 1 15 2 2 1 2 2 2 16 1 1 1 1 1 1 17 2 2 2 1 2 2 28 1 1 1 1 1 1 19 2 2 2 2 2 2 210 1 1 1 1 1 1 111 2 2 1 2 2 2 112 1 1 1 1 1 1 113 1 1 2 1 1 1 214 1 1 1 1 1 1 115 2 2 1 2 2 2 116 1 1 1 1 1 1 117 2 2 1 2 2 2 118 1 1 1 1 1 1 119 1 1 2 1 1 1 220 1 1 1 1 1 1 121 2 2 1 2 2 2 122 1 1 1 1 1 1 123 1 2 1 2 2 2 124 1 1 1 1 1 1 1

Table A.4: Discretized values for the seven nodes of the subnetwork Reservoir. The discretisation is carried out according tothe cluster identified in Figure 8.

ResMmdGas / ResMmdOil low highlow 0.9333 0.0667

high 0 1.0000

Table A.5: Conditional Probability Tables for the node ResMmd given node ResTop, within the Reservoir subnetwork.

27 assessments, and it is therefore too large to show here. It can be derived with the same criteria, but recall636

that the idea of BNs is to break up the joint modelling using local CPTs.637

Node / State 1 (low) 2 (high)ResTop 0.6250 0.3750

ResMmd 0.5833 0.4167ResOu 0.7500 0.2500

ResMmdGas 0.6667 0.3333ResMmdOil 0.5833 0.4167ResOuGas 0.5833 0.4167ResOuOil 0.7500 0.2500

Table A.6: Marginal prior distributions of the seven variables (nodes) within the Reservoir subnetwork.

33

Page 34: Building Bayesian networks from basin modeling scenarios ...pangea.stanford.edu/departments/ere/dropbox/scrf/documents/reports/... · 2 scenarios for improved geological decision

In order to derive the correlation coefficients reported in Section 4.1, we can use the standard Pearson

correlation coefficient formula in the special case of discrete variables. Let us consider a joint distribution

of two random variables x1 and x2: in this case we have four possible outcomes: p11 = p(x1 = 1, x2 = 1),

p21 = p(x1 = 2, x2 = 1), p12 = p(x1 = 1, x2 = 2) and p22 = p(x1 = 2, x2 = 2). Let us denote with

p1 = p(x1 = 1) and with p2 = p(x2 = 1). Then, the correlation coefficient is:

ρ =p11p22 − p12p21√p1p2(1− p1)(1− p2)

In the case of the variables ResMmdGas and ResMmdOil, for example, we have p11 = 0.5833, p21 =638

0.0833, p12 = 0 and p22 = 0.3333. Furthermore, from Table A.6, we have that p1 = 0.6667 and p2 = 0.5833.639

Therefore in this case ρ = 0.8367, i.e. the two variables are highly correlated.640

Finally, in order to derive the CPT for nodes that are not in a parent-child relation, we must propagate the

information on the BN using Bayes theorem, and summing out the variables (marginalization). Therefore,

for example, when we ask p(ResMmdGas = 1|ResMmdOil = 1), we have just the variable ResMmd in

between, and we can proceed as follows:

p(ResMmdGas = 1|ResMmdOil = 1) =

2∑j=1

p(ResMmdGas = 1, ResMmd = j|ResMmdOil = 1)

=

2∑j=1

p(ResMmdGas = 1|ResMmd = j, ResMmdOil = 1)p(ResMmd = j|ResMmdOil = 1)

=

2∑j=1

p(ResMmdGas = 1|ResMmd = j)p(ResMmdOil = 1|ResMmd = j)p(ResMmd = j)

p(ResMmdOil = 1)

If we substitute all the values present in our original CPT, we can get the result p(ResMmdGas =641

1|ResMmdOil = 1) = 1, as shown in Table 2. In this last passage we have exploited the conditional642

independence property of the BN, i.e. a node, given its parents, is conditionally independent from all the643

nodes that are not its descendants. In this case therefore ResMmdGas is independent from ResMmdOil644

given the value of its parent ResMmd. The same procedure can be applied on much larger scales using spe-645

cific propagation algorithms. One approach is the so called Variable Elimination algorithm. A more efficient646

algorithm, implemented in the software package that we have used (Murphy, 1999), is called Junction Tree647

Algorithm. For details about this algorithm see Jordan (1998) and Cowell et al. (2007).648

34