Facies Characterization of a Reservoir in the North Sea...

1

FaciesCharacterizationofaReservoirintheNorthSeaUsingMachineLearningTechniques

ProjectFinalReportforCS229(Aut2016-17)

PeipeiLi([email protected]);YuranZhang([email protected])

IntroductionIn oil and gas industry, oil is produced from wells drilled into the oil reservoirs. A thoroughunderstanding and characterization of the subsurface is crucial to sustainablemanagement of areservoir. The subsurface, however, is largely heterogeneous and oil is not present everywhere.Forourstudyarea,thereservoirrockiscategorizedasthreefacies:brinesand,oilsandandshale,andit isonlyoilsandthatweconsiderexploitable.Thefocusofthisprojectistoidentifyoilsandbasedonwelllogdataandseismicdatausingmachinelearningalgorithms.

Well loggingisapowerfultool forreservoirengineerstoobtaininformationfromthesubsurface,whereawirelinetoolisloweredintotheborehole,alongwhichmeasurementsweretakensothatrockpropertiescanbeinferredfromthemeasureddata.Commonwelllogsincludegammaray,P-wavevelocity,S-wavevelocity,density,etc.Gammarayenablesustodifferentiatedifferent faices(sand and shale) at thewellbore directly, and all the other logs are indirect indicators of facies.However,one limitationofwell logs is that it isnotable toprovide informationon the reservoirthat is further away from the wellbore. On the other hand, seismic reflection data is generallygathered inmore extensive area and can provide information for thewhole 3D reservoir.Whileseismicdatadoesn’tcontaindirectfaciesinformation,itcanbeusedtogetP-wavevelocity,S-wavevelocity,density,andotherreservoirpropertiesusingseismicinversiontechniques.

Sothegeneralideaoffaciescharacterizationisasfollows.First,usingwelllogdata,aclassificationmodelisbuilttoclassifyfaciesusingreservoirpropertieslikeP-wavevelocity,S-wavevelocityanddensity.Thenseismicinversionisdoneonseismicreflectiondatatoinvertforthesepropertiesinthewhole3Darea.Oncethesepropertiesareobtainedfromseismicdata,theclassificationmodelbuiltearliercanbeappliedtodopredictiononthewhole3Darea.

Facies characterization using seismic data and statistical rock physics has been a useful tool inreservoirexploration.However,thepreviousmethodsmostlyfocusonseismicinversionotherthanthe statistical analysis [1][2][3][4][5]. In these methods, statistical rock physics is never fullyexploited usingmachine learningmethods. Instead, they are done qualitatively (scatter plot) oronlytwopropertiesareusedtodosimpleclassification.

Inthisproject,wefocusonthestatisticalrockphysicspartwherewewillfullyexploitclassificationoffaciesusingwelllogdata.VariousmachinelearningalgorithmssuchasGDA,logisticregression,RandomForestandSVMwereexploredandcomparedtofindthebestmodel.

DataProcessingThereservoirofinterestinthisprojectisintheNorthSea.Wehaveonewellwithacompletesetofwelllogsavailable,whichwereusedontrainingandselectingclassificationmodels.

The machine learning problem we propose is actually a multi-class classification problem. Thelabelsarethethreefacies:shale,brinesandandoilsandwhicharecreatedusingGRlogsandfluid

2

substitution. The features are three other well logs (𝑉!,𝑉!and𝑅ℎ𝑜𝑏) and six more propertiescalculatedusingavailablewelllogs.

FaciesIdentificationGRlogdata isconsideredasdirect facies indicator.Combinedwithothergeologicalobservations,GRlogservesasthe“correctanswer”ofthelabelsofthetrainingdata.Asshowninfigure1,shaleandsandsamplesareselectedusingGR logandmarkedrespectively inredandblue.Therestofreservoir (black) is considered as interbedded shale and sand (not pure shale or sand) and thuscannotbeused inthemodelingprocess.AlongwiththeGR logused in facies identification,otherwell logs that will be used as features in building the statistical models are also displayed asreference,where𝑉!,𝑉!and𝑅ℎ𝑜𝑏are respectively P-wave velocity, S-wave velocity and density ofthereservoiratthewelllocation.

Intotal,thereare394shalesamplesand492sandsamplesselectedavailableformodeling.Thesesand samples are considered as pure brine sand even though some of the sand is actually in oilzone. That’s becausemud used in the drilling and logging process is filtered into the formationwhileloggingtoolscanonlydetectveryshallowformation.Thus,it’sactuallymeasuringpropertiesofbrinesand.Toretrieveoilsandproperties,Gassmannfluidsubstitutionisusedtoreplacewaterwithoil.Bydoingthis,492oilsandsampleswerecreatedasthethirdclass.

Therefore,thetotalnumberofsampleswehaveis1378.Aswell logdataisonlyavailableforonewell,wefurtherrandomlysampled80%ofthedataformodeldevelopment,andsaved20%ofthedataforfinaltestingandmodelcomparison.

Figure 1. Gamma ray , Vp, Vs and density log data of the reservoir at the well location. Only pure shale and sand

data were used for building the model.

FeatureCreationAsshowninFigure1,theavailablewelllogs𝑉!,𝑉!and𝑅ℎ𝑜𝑏areindeedrelevanttofacies.Actually,inadditiontothesethree,therearesomeotherreservoirpropertiesthatcouldalsobeaffectedbydifferent facies. These reservoir properties are basically non-linear functions of the above threebasicreservoirproperties.Byapplyingthephysicalfunctions,sixmorefeatureswerecreatedbased

3

onthethreeavailablewelllogs.Theseextrafeaturesinclude:shearmodulus(𝜇 = 𝑅ℎ𝑜𝑏 ∙ 𝑉!!),bulkmodulus (𝐾 = 𝑅ℎ𝑜𝑏 ∙ (𝑉!!−

!!𝑉!!) , P-wave impedance ( 𝐼! = 𝑉! ∙ 𝑅ℎ𝑜𝑏 ), S-wave impedance

(𝐼! = 𝑉! ∙ 𝑅ℎ𝑜𝑏), Poisson’s ratio(𝜈 = !!!!!!!!

!(!!!!!!!)) and Lame’s coefficient(𝜆 = 𝑅ℎ𝑜𝑏 ∙ 𝑉!!−

!!𝑉!! −

!!𝑅ℎ𝑜𝑏 ∙ 𝑉!!).Thuswehave9featuresintotal.

ImplementationofMachineLearningAlgorithmsTwolinearclassifiers(SoftmaxRegressionandGaussianDiscriminantAnalysis)andtwonon-linearclassifiers (Random Forest and Support Vector Machine) are explored to solve the multi-classification problem. 80% of the data was used for training, such as feature selection andparametertuning,andtherest20%wasusedforfinaltestingandmodelcomparison.

SoftmaxRegressionSoftmaxregressionisageneralizedlinearmodelthatassumesthedataasdistributedaccordingtoamultinomialdistribution.Thesoftmaxfunctionis:

𝑝 𝑦 = 𝑖 𝑥; 𝜃 = 𝜙! =𝑒!!

!!

𝑒!!!!!

!!!

The “multinom” function from the “nnet” package in R was used to effectively conduct softmaxregressiononthedata.Inordertofindoutwhichfeaturesaremost“relevant”tothelearningtask,forwardsearchfeatureselectionwasappliedinordertofindthebestfeaturesubset.Eachfeaturecombination in the forward searchmethodwasevaluatedusingk-fold cross validation (withk=10).Giventhefactthatthereare1000+datasamplesavailableformodeling,k-foldcrossvalidationnot only holds out merely 10% of data in each training iteration, but is also computationallyinexpensivesincethesamplesizeisnottoolarge.

ThefeaturesetFobtainedfromforwardsearchmethodis:{"𝑅ℎ𝑜𝑏","𝜇","𝑉!","𝐼!","𝜈","𝐼!","𝑉!","𝜆","𝐾"}. It was found that the cross validation error (CV error) is lowest when ~5 features wereconsidered,asillustratedinFigure2(left).Whenthenumberoffeaturesfurtherincreasedbeyond5,thetestingerrorincreasedslightly,indicatingasmalldegreeofoverfitting.

GaussianDiscriminantAnalysis(GDA)GDAisagenerativelearningalgorithm.Ittriestofindalineartransformationtomaximizetheso-calledRayleighcoefficient, that is, theratioof thedeterminantof the inter-classscattermatrixoftheprojectedsamplestotheintra-classscattermatrixoftheprojectedsamples[6]:

𝐽 Φ =Φ!Σ!ΦΦ!Σ!Φ

WhereΣ! andΣ! arerespectivelytheinter-classandintra-classmatrix.

The“lda”functionfrom“mass”libraryinRwasusedtoconductGDAonthedata.Forwardsearchfeature selection via k-fold cross validation is applied as described in the Softmax Regressionsection.ThefeaturesetFobtainedfromforwardsearchmethodis:{"𝑅ℎ𝑜𝑏","𝜇","𝑉!","𝑉!","𝐼!","𝜆","𝐼!","𝐾","𝜈"}.ItwasfoundthattheCVerrorwaslowestwhen7featureswereconsideredinmodeltraining,asillustratedinFigure2(right).

4

Figure 2. Error vs #features for the Softmax Regression model, indicating that 5 features yields lowest error (left);

Error vs #features for the GDA model, indicating that for GDA, 7 features yields lowest error (right).

RandomForestRandom Forest is a special case of baggingmethods for decision trees. “randomForest” functionfrom “randomForest” library inRwasused toperformRandomForest classificationon thedata.Weusedk-foldcrossvalidation(k=10)todeterminethebest“mtry”valueforclassification,where“mtry” denotes the number of variables randomly sampled as candidates at each split [7]. Asindicated in the figure below,mtry = 5 yields the lowest CV error. Note that all possible “mtry”valuesgiveverylowerror,andthelowesterrorisnotmuchdifferentfromtherest.

Figure 3. Parameter selection results for SVM (left) and Random Forest (right)

SupportVectorMachine(SVM)SupportVectorMachineisamaximummarginclassifier.Ittriestosolvethefollowingoptimizationproblem:

𝑚𝑖𝑛!,!,! 12𝜔 ! + 𝐶 𝜉!

!

!!!

, 𝑠. 𝑡. 𝑦 ! 𝜔!𝑥 ! + 𝑏 ≥ 1 − 𝜉! , 𝜉! ≥ 0, 𝑖 = 1⋯𝑚

“svm” function from “e1071” library inRwasused toperformclassificationon thedata [8].ThekernelfunctionusedinthisprojectisGaussianradialbasisfunction.ArangeofthecostparameterCisevaluated,andtheresultisoptimalwhenC=1024,asillustratedinFigure2.

5

ModelComparison&Results&DiscussionNowwerebuildeachmodelusingselected featuresorparameters in theprevioussectionon thetrainingdataandcomparethemodelsusingthetestingdata.ThetrainingerrorandtestingerroraresummarizedinTable1.

Table 1. Model comparison with MSE

Asshowninthetable,non-linearmodelsgenerallyoutperformlinearmodels,asindicatedbytheirlower training and testing error. For eachmodel, testing errors are slightly higher than trainingerrors,indicatingpropermodelcomplexity.

Togetmoreinsightsonmodelperformance,wegeneratedtheconfusionmatrixforeachmodel,asillustratedinFigure4.ItcanbeseenthattheresultsareconsistentwithmodelcomparisonusingMSE.Allmodelsperformedwellindiscriminatingsandfromshale,althoughdifferentiatingoilsandfrombrinesandturnedouttobeslightlylessoptimal.

Amongall fouralgorithms,SVMachievesbestclassificationresults,especially in termsof the lownumberoffalsepositivepredictionsofoilsand.

Figure 4. Model comparison with confusion matrix

FutureWorkWiththeconclusionabove,thenextstepistoapplytheSVMto3Ddatasetforfaciesclassification.However, we only have P-wave impedance and S-wave impedance inverted from seismic dataavailableforuse.Asareference,webuiltasoftmaxregressionmodelusingthesetwofeaturesonlyand it showed an error of 0.23. So we think getting more seismic attributes could improve theclassificationsignificantly.

6

References[1] Avseth, P., et al. "Seismic reservoirmapping from 3-DAVO in aNorth Sea turbidite system."Geophysics66.4(2001):1157-1176.

[2] Mukerji, T., et al. "Mapping lithofacies and pore-fluid probabilities in a North Sea reservoir:Seismicinversionsandstatisticalrockphysics."Geophysics66.4(2001):988-1001.

[3]Mukerji,Tapan,etal."Statisticalrockphysics:Combiningrockphysics,informationtheory,andgeostatistics to reduce uncertainty in seismic reservoir characterization."The Leading Edge20.3(2001):313-319.

[4] Avseth, Per Åge.Combining rock physics and sedimentology for seismic reservoircharacterizationofNorthSeaturbiditesystems.Diss.StanfordUniversity,2000.

[5] Bosch, Miguel, Tapan Mukerji, and Ezequiel F. Gonzalez. "Seismic inversion for reservoirpropertiescombiningstatisticalrockphysicsandgeostatistics:Areview."Geophysics75.5(2010):75A165-75A176.

[6] Li, Tao, Shenghuo Zhu, and Mitsunori Ogihara. "Using discriminant analysis for multi-classclassification: an experimental investigation."Knowledge and information systems10.4 (2006):453-472.

[7]Liaw,Andy,andMatthewWiener."ClassificationandregressionbyrandomForest."Rnews2.3(2002):18-22.

[8]Meyer,David,andFHTechnikumWien."Supportvectormachines."The Interface to libsvminpackagee1071(2015).

Facies Characterization of a Reservoir in the North Sea...

Documents

Transcript of Facies Characterization of a Reservoir in the North Sea...