Geological Type Recognition by Machine Learning on In-Situ...

Research ArticleGeological Type Recognition by Machine Learning on In-SituData of EPB Tunnel Boring Machines

Qian Zhang1 Kaihong Yang1 Lihui Wang2 and Siyang Zhou 1

1Key Laboratory of Modern Engineering Mechanics School of Mechanical Engineering Tianjin UniversityTianjin 300072 China2Department of Military Vehicle Academy of Military Transportation Tianjin 300161 China

Correspondence should be addressed to Siyang Zhou xiaodaidaizyy163com

Received 6 November 2019 Revised 25 March 2020 Accepted 30 March 2020 Published 27 April 2020

Academic Editor Akhil Garg

Copyright copy 2020 Qian Zhang et al -is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

At present many large-scale engineering equipment can obtain massive in-situ data at runtime In-depth data mining isconducive to the real-time understanding of equipment operation status or recognition of service environment -is paperproposes a geological type recognition system by the analysis of in-situ data recorded during TBM tunneling to address geologicalinformation acquisition during TBM construction Owing to high dimensionality and nonlinear coupling between parameters ofTBM in-situ data the dimensionality reduction feature engineering and machine learning methods are introduced into TBM in-situ data analysis -e chi-square test is used to screen for sensitive features due to the disobedience to common distributions ofTBM parameters Considering complex relationships ANN SVM KNN and CART algorithms are used to construct a geologyrecognition classifier A case study of a subway tunnel project constructed using an earth pressure balance tunnel boring machine(EPB-TBM) in China is used to verify the effectiveness of the proposed geological recognition method -e result shows that therecognition accuracy gradually increases to a stable level with the increase of input features and the accuracy of all algorithms ishigher than 97 Seven features are considered as the best selection strategy among SVM KNN and ANN while feature selectionis an inherent part of the CARTmethod which shows a good recognition performance -is work provides an intelligent path forobtaining geological information for underground excavation TBM projects and a possibility for solving the problem of en-gineering recognition of more complex geological conditions

1 Introduction

With the rapid development of sensor detection technologyan increasing number of large-scale engineering equipmentare made available to capably provide rich monitoring datain real time during construction -ese data contain a largenumber of control rules related to equipment operation Anintelligent analysis of engineering monitoring data canprovide a new path for the research on complex engineeringproblems and offer a decision-making basis for intelligentcontrol of the engineering equipment

A tunnel boring machine (TBM) is a type of large-scaleengineering equipment that is widely used in tunnelingconstruction -is equipment combines the functions ofsoil cutting soil debris conveying and tunnel supporting to

achieve full mechanized construction of tunnel engineering[1] with a high degree of safety and construction efficiency[2] A schematic diagram of the construction of a TBM isshown in Figure 1 A TBM consists of many parts in-cluding a cutter head TBM head gripper gunitingjumbolter and belt conveyor [3] Functions such as ex-cavation support and guidance need to be carried out bydifferent mechanical components and the synergy of theoverall mechanical system During the process of TBMexcavation it is necessary to continuously adjust theconstruction strategy based on the operational state and theinformation of the surrounding geological environmentwhich is an important basis for the safe and efficient op-eration of the machine [4] However due to the specialcharacteristic of the underground excavation by a TBM the

HindawiMathematical Problems in EngineeringVolume 2020 Article ID 3057893 10 pageshttpsdoiorg10115520203057893

poor service conditions and complex muddy environmentmake it very inconvenient to observe the construction stateof a TBM -erefore the monitoring data acquired andrecorded by various sensors loaded on the main parts of themachine form an important information basis for under-standing the working state of the equipment [5ndash8] CurrentTBMs can simultaneously monitor hundreds of operationparameters such as the tunneling rate cutter head rota-tional speed cylinder thrust cutter head torque and sealedchamber pressure -ese in-situ data contain rich infor-mation on interactions between the machine and thesurrounding environment -e rapid development oftechnologies such as big data and artificial intelligence inrecent years provides a more effective method and path forin-depth and sufficient exploration of information pro-vided by in-situ data to realize the informatization andintelligence of TBM construction [9]

In the early stage the engineering data of TBM wasused to establish some empirical models used in solvingpractical problems conveniently For example Krause [10]used the data of hundreds of TBM construction fromGermany and Japan to analyze and give the empiricalprediction range of tunneling load In addition classicempirical models include the NTNUmodel [11] developedby the Norwegian University of Science and Technologyand the improved model of Bruland [12] which are oftenused in engineering for the prediction of the rate ofpenetration On the basis of empirical models manyscholars have further given parameter prediction modelsbased on a statistical analysis of engineering data Forexample Zhang et al [13] established tunneling loadprediction model for the earth pressure balance tunnelboring machine (EPB-TBM) by combining regressionanalysis with dimensional analysis from engineering dataAvunduk and Copur [14] established a nonlinear re-gression model of rate of penetration by several soilproperty parameters such as particle size distribution andnatural water content Macias et al [15] analyzed thechange rule of prediction curve of rate of penetration of ahard rock TBM under different fracturing conditions andthe fracturing coefficient was determined as an effectiveindex of the influence of rock fracturing on tunnelingperformance -e single index affecting the rate of

penetration was regressed by Armetti et al [16] to analyzethe influence degree of different parameters in the em-pirical model on the tunneling performance Vergara andSaroglou [17] established the regression relationship be-tween the weighted rock mass rating and mixed-facepenetration index under the condition of mixed geologyconsidering the proportion of rock and soil in the tun-neling face Yagiz et al [18] set up regression equation topredict the tunneling performance under the condition ofjoint fault rock based on the rock properties such asdistance between planes of weakness and orientation ofdiscontinuities in rock mass -e works based on TBM in-situ data mainly focus on the basic statistical regression ofkey tunneling performance parameters with some limi-tations on the applicable problems and the number offeatures that can be considered Statistical regression canextract and describe the rules in the data But TBM is acomplex engineering system with hundreds of parameterscollected in the process Moreover TBM in-situ data areoften characterized by many influencing factors andnonlinear coupling between parameters which makes itdifficult for in-depth data mining [19] -e valuable in-formation hidden within the massive monitoring dataremains to be explored

In recent years machine learning algorithms havebeen developed rapidly Because of their excellent non-linear expression ability and adaptability to massive datathey provide powerful tools for TBM in-situ data analysisSome typical works are as follows Bouayad and Emeriault[20] established a prediction model of ground settlementcaused by shield machine based on earth pressure balancethrough the principal component analysis (PCA) andadaptive neuro-fuzzy inference system (ANFIS) Mah-devri et al [21] used a support vector machine and ar-tificial neural network to predict the tunnel convergencecaused by ground compression and verify the outputresults and measured data of the model through engi-neering examples Hyun et al [22] combined fault treeanalysis (FTA) and analytic hierarchy process (AHP) toanalyze the risk and probability of shield construction andconstructed a risk management system according to goodconsistency Salimi et al [23] used nonlinear regressionand artificial intelligence algorithms to predict the per-formance of the hard rock TBM Sun et al [24] establisheda model by random forest to predict the dynamic load ofshield tunneling Gholamnejad and Tayarani [25] used anartificial neural network to predict the rate of penetrationwith three rock mass parameters of uniaxial tensilestrength rock quality index and weak face spacing andtried to evaluate the results with different hidden-layersettings Adoko et al [26] proposed a Bayesian method toselect the performance of different tunneling machinesSeker and Ocak [27] compared the application effect ofrandom forest and other ensemble learning algorithms inthe prediction of the rate of penetration Gao et al [28]used several kinds of recurrent neural networks to analyzethe sequence rule of TBM performance parameters so asto predict the important performance parameters inadvance

Figure 1 Schematic diagram of TBM underground construction

2 Mathematical Problems in Engineering

Previous studies have shown that machine learningmethods can be used in multiparameter analysis of TBMdata In addition these results indicate that changes in thegeological types during TBM driving will be reflected in thein-situ data through tunneling between the machine and thegeology Due to the characteristics of underground exca-vation in TBM various geological conditions may be facedin TBM tunneling -e geology varies greatly betweendifferent projects such as soft soil hard rock and compositeground -erefore geological conditions are the importantfactors affecting the project and geological type recognitionis one of the major tasks in TBM engineering-erefore it isa feasible way to identify geology category by digging intothe relationship between TBM in-situ data and geologicalconditions Furthermore it may be a feasible way to analyzethe relationship between TBM in-situ data and geologicalconditions so as to identify different construction geologicaltypes

In this paper feature selection and machine learningmethods are introduced into the engineering data analysis topropose a geological recognition system based on in-situdata analysis during tunneling -e proposed methodprovides an effective way to acquire geological informationfor construction decision-making -e influence parameterssensitive to the change of geological type are selected as inputfeatures by the feature engineering algorithm for dimensionreduction While four machine learning classification al-gorithms KNN (k-nearest neighbor) SVM (support vectormachine) ANN (artificial neural network) and CART(classification and regression tree) are selected to traindifferent geological type labels And the recognition per-formance is evaluated in an independent test set -roughthe above steps the sensitive features are extracted from in-situ data and the geological recognition system is estab-lished In this paper a subway tunnel project constructed bythe tunnel boring machine (EPB-TBM) is taken as a case todiscuss the effectiveness of the above methods -e proce-dure of the proposed TBM geological recognition system isshown in Figure 2

2 Methods

-e geological recognition system proposed in this papermainly includes the following three steps First normali-zation preprocessing is performed to reduce the dominanteffects generated by the difference in dimensions and orderof magnitude between different parameters in the TBM in-situ data Second the chi-square test which is the non-parametric test method in feature selection is used to selectthe key parameters that are highly sensitive to geologicalvariation as input features -ird several typical machinelearning classification algorithms are used to train the datasets with geological labels to obtain the geological recog-nition classifier which is used to perform the geological typerecognition -e test set data are used to validate the ac-curacy of the geological recognition system and evaluate theeffectiveness of the method

21 Data Preprocessing During the TBM excavation pro-cess numerous types of information related to machineoperation such as hundreds of different types of engineeringparameters including the cylinder thrust motor torquecutter head rotational speed advance rate guiding attitudeand sealed chamber pressure can be recorded in real time inthe data acquisition system -ese engineering parametershave various dimensions and the corresponding numericalmagnitudes are very different For example the cylinderthrust can reach tens of thousands of kN while the advancerate is usually only tens of millimeters per minute both ofwhich are important factors reflecting the features of theoperating states of the machine in different geologicalconditions

Considering that most feature selection and machinelearning algorithms are not invariant to scale to preventcertain parameters from playing a dominant role in datamining due to differences in the order of magnitude all the

Preprocessing

TBM in-situdata

Min-max normalized

Exploration information

Geological label

Start

End

Preprocessing data set with labels

Feature Engineeringand model training

Chi-square test

Add to input in descending order of chi-square value

Machine learning classification algorithm

10-fold cross validation

No

Yes

Geological recognition classifier

Reach the requirementof accuracy

Figure 2 Flow chart of the geological identification system

Mathematical Problems in Engineering 3

parameters in this work are min-max normalized beforemachine learning classification -e calculation method is

xpre x minus xmin

xmax minus xmin (1)

where xpre is the dimensionless form after normalizationpretreatment xmin is the minimum value in the recordeddata of this parameter and xmax is the maximum value in therecorded data of this parameter

Min-max normalization can convert parameters fromdimensional to dimensionless andmap the parameters to theinterval of 0 to 1 so that parameters with different di-mensions and orders of magnitude can be treated as equallyas possible in the subsequent analysis In addition the use ofnormalization in the actual solution is beneficial for im-proving the convergence speed and results

22 Feature Engineering As mentioned above the in-situdata include the records of hundreds of engineering pa-rameters whereas in the existing engineering experienceonly a few parameters such as cylinder thrust and motortorque are used to analyze the geological conditions [29 30]However it is of great concern to fully investigate thevariation of the parameters in the data with the geologicalconditions thus achieving effective geological recognitionTo this end it is necessary to more comprehensively con-sider and select the parameters that are highly sensitive togeological changes as the input features for the subsequentmachine learning namely to conduct feature engineering-rough this step redundant parameters with low corre-lation with geological changes can be removed while theinformative parameters are retained which is conducive toimproving the recognition accuracy reducing the empiricalrisk and avoiding the overfitting problems caused by in-correct generalization due to the accidental nature of certainparameters in engineering

-e engineering data often do not follow the commondata distribution forms and the relationships among manyparameters cannot be explained by independent statisticalanalysis Instead the target variable is influenced by acombination of parameters [31] -erefore this work usesthe chi-square test algorithm for feature engineering -echi-square test is a nonparametric test method that repre-sents the degree of the deviation between the observed valueand the theoretical value based on the independence as-sumption and it does not make assumptions on the datadistribution Hence this method is suitable for the analysisof the engineering data in this research Its basic principle isto evaluate the parameter independence by calculating thedeviation between the theoretical value and the expectedone -e specific calculation formula is

χ2 1113944k

i1

Ni minus Ei( 11138572

Ei

(2)

where χ2 is the chi-square value of the parameter k is thenumber of recorded values Ni is the actual value and Ei isthe expected value χ2 is a measure of the degree to which theexpected value and the actual value deviate from each other

-e high value of χ2 indicates that the independent hy-pothesis is incorrect that is the parameter as an inputfeature is helpful to judge whether a certain kind of eventoccurs or not

-e geological type recognition problem to be solved inthis work is essentially a type of supervised classificationproblem For the training set data different geologicaltypes are marked with the supervised learning label in theconstruction area of known geological information Usingthe geological label as the target the chi-square test isperformed on the training set data to yield the chi-squarevalue of each parameter under the given geological label-e values are sorted from the largest to the smallest andthe first few parameters that is those with the highestsensitivity are selected as the input features of the sub-sequent recognition algorithm After testing and valida-tion the input features with the best recognitionperformance are selected as the optimal input featuresDue to the long distance of TBM construction and ir-regular geological changes the in-situ data of TBM aremassive and with the nonuniform distribution of infor-mation To more effectively evaluate the impact of differentfeature selection strategies on the performance of thegeological recognition system this paper uses the 10-foldcross-validation method [32] with its basic idea given inFigure 3 -e dataset is divided into ten subsets withsimilar amounts of data A subset is selected as the test setsuccessively without repetition and the remaining ninesubsets are used as the training sets until all the subsetshave been validated as test sets once Finally the evalu-ation values using the ten test sets are averaged and takenas the final evaluation value of the 10-fold cross-validationmethod -us the contingency and randomness problemscaused by the use of a single test set are avoided as much aspossible giving an insight on how the model will gener-alize to an independent dataset

23 Applied Algorithms and Classification MetricsConsidering the characteristics of TBM in-situ data in-cluding high dimensionality nonlinear coupling of pa-rameters and high noise four commonly used supervisedclassification algorithms namely KNN (k-nearest neigh-bor) SVM (support vector machine) ANN (artificial neuralnetwork) and CART (classification and regression tree) areselected in this study to express the relationships between theinput features and geological labels and to establish severalcorresponding geological type recognition systems

As shown in Figure 4(a) the k-nearest-neighbor (KNN)[33] algorithm is an example-based method which makesdecisions on prediction by the properties of K sample pointsclosest to the prediction points in the feature space -eprinciple is simple and it can adapt to multiclassificationtasks

Support vector machine (SVM) [34] illustrated inFigure 4(b) is a geometric method to find the optimalseparating hyperplane through support vector In thenonlinear case SVM maps the nonlinear problems in theoriginal space to the high-dimensional space through the


kernel function which only needs fewer support vectors tomake decisions and adaptability to the high-dimensionalproblems making it one of the most widely used machinelearning methods

-e basic principle of the artificial neural network(ANN) is shown in Figure 4(c) which is a nonlinear fittingmodel inspired by the biological neural system [35] It ismainly composed of input layer hidden layer and outputlayer In the hidden layer it is endowed with nonlinearproperties by complex network structure and activationfunction Because of its strong nonlinear expression ability

it has become one of the most popular fields in machinelearning methods in recent years

In Figure 4(d) the classification and regression tree(CART) [36] method is one of the decision tree methods-rough the Gini index it constantly searches for the bestfeature and the best segmentation point and divides thebinary tree so as to complete the classification of the wholedata set -e biggest characteristic of the cart algorithm isthat it can provide a clear and even visual decision-makingprocess thus providing useful guidance in practicalengineering

D

D1 D2 D3 D4 D5 D6 D7 D8 D9

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

D9

D1

D1 D2 D3 D4 D5 D6 D7 D8 D10

D10

D2 D3 D4 D5 D6 D7 D8 D9 D10

Test setTraining set

Evaluation 1

Evaluation 2

Evaluation 10

Get average Evaluation of 10-foldcross validation

Figure 3 Diagram of 10-fold cross validation

K = 8

K = 5

Sample to be predicted

(a)

Support vectors

Support vectors

Optimal separatedhyperplane

(b)

Inputlayer

Hiddenlayer

Outputlayer

x1

x2

y1

ynxn

(c)

If

If

IfResult

ResultResult

Result

Ture

Ture

Ture

False

False

False

(d)

Figure 4 Basic principles of four classification algorithms (a) KNN (b) SVM (c) ANN (d) CART


To quantify the quality of predictions there are severalmetrics that are adopted to assess the prediction accuracyAmong the supervised classification problems in machinelearning the accuracy (AR) precision (PR) recall (RE) andF1-score (F1) are the most commonly used indices toevaluate the performance of classifiers Besides the confu-sion matrix is a format used to show classification resultsFor example the confusion matrix for the binary classifi-cation problem is shown in Table 1 where true positive (TP)is a prediction of a positive class as a positive class truenegative (TN) is a prediction of a negative class as a negativeclass false positive (FP) is a prediction of a negative class as apositive class which is a type I error and false negative (FN)is a prediction of a positive class as a negative class which is atype II error

Based on a given confusion matrix the accuracy pre-cision and recall can be calculated -e accuracy is the mostcommon classification evaluation which represents thenumber of correctly classified samples divided by the totalnumber of samples -e precision represents the percentageof samples that are correctly classified in the samples that aredetermined to be of a certain class -e recall is a measure ofthe covering surface and represents the proportion of cor-rectly classified samples in the samples that should beclassified as a certain class Since precision and recallsometimes conflict with each other high precision is usuallyaccompanied by a low recall and vice versa while the F1-score is a comprehensive evaluation of these two parametersIn this paper these four indices are used to evaluate geo-logical recognition results -e calculation method of eachevaluation index is as follows

AR TP + TN

TP + TN + FP + FN

PR TP

TP + FP

RE TP

TP + FN

F1 2 times PR times REPR + RE

(3)

3 Results and Discussion

-e method proposed in this paper is applied to the geo-logical recognition of the actual tunnel engineering and theapplicability and effectiveness of the method and the rec-ognition performance of different classification algorithmsare discussed in this section As a preliminary study to usethe machine learning method to recognize geological typesin order to test the feasibility of this method Tianjin MetroLine 9 and Tianjin Metro Line 3 are discussed in this paperwhich are mainly composed of soft soil A few types areinvolved in the section of the data such as muddy clay andsilt and silty clay -e section of Tianjin Metro Line 9 isapproximately 1104m long constructed using an EPB-TBM -e construction area of this project mainly passedthrough soft soil such as silty clay muddy clay and silty soil

-e engineering data used in this paper have 357 parametersrecorded by the data acquisition system during construction-e sampling frequency was approximately set to every 30 s(approximately advanced by 17mm) In the application thedataset is divided into training sets and test sets according tocertain proportions -e training set data are used to es-tablish the geological recognition system while the test setdata are not involved in the training process but used forindependent testing of the recognition results To obtain thegeological recognition labels of the supervised classificationalgorithm the geological survey report obtained from thegeological exploration is used as the prior informationTable 2 lists the basic statistical characteristics of somepresentative TBM tunneling parameters in Tianjin MetroLine 9

31 Implementation of Feature Selection and Performance ofthe Geological Recognition System In this section the rec-ognition accuracy and computational time of the fourgeological classifiers are discussed with different numbers offeatures selected using the aforementioned engineeringexample

For feature engineering dealing with high-dimensionalproblems it is necessary to comprehensively consider theissues of training precision computational cost and possibleoverfitting in the selection of the appropriate number offeatures as the effective input for classifier training -ere-fore the effect of the number of different features on therecognition accuracy of the four types of geological classi-fiers is discussed first

-e hyperparameters of the algorithms used are set asfollows the number of hidden layers is 4 and the number ofnodes in each layer is 10 in the ANN -e distance metric isEuclidean distance for the KNN-e kernel function of SVMis radial basis unction and the criterion in CART is Ginicoefficient

In the Tianjin Metro Line 9 the variation results of thegeological recognition by KNN SVM ANN and CART areshown in Figure 5 All the points are given by the accuracyfrom 10-fold cross validation In this figure it is shown thatas the features are added to the feature input according to thechi-square values the accuracy of the geological recognitionmodels gradually increases Eventually the algorithms havegood recognition performance with the accuracy exceeding96 after Kgt 4 Performance of algorithms is discussedbased on the results in Figure 5 Among the four algorithmsthe recognition performance of KNN is significantly goodand the classification result reaches an accuracy of 999whenK 3 in KNN probably because the TBM constructionis a continuous process so that KNN can find similar samplesfor decision in the high-density TBM data collection moreeffectively While the performance of SVM is inferior to

Table 1 Confusion matrix for a binary classification problem

Positive NegativeTrue True positive (TP) True negative (TN)False False positive (FP) False negative (FN)


other algorithms which may be caused by the difficulty inthe selection of the hyperparameters resulting from thecomplex distribution characteristics and noise phenomenonof TBM in-situ data

Figure 5 can also provide some references for thenumber of inputs for this multiple input problem Formost algorithms the recognition results are generallygood when K = 7 Subsequently with the increasingnumber of features the accuracy only slightly improvesCombined with the consideration of the calculation andfeature acquisition costs and the complexity of the rec-ognition system K = 7 is used for feature combination asthe optimal feature selection strategy for the geologicalrecognition operation in this work In addition sincefeature selection is an inherent part of the CART algo-rithm it does not participate in the discussion of the chi-square test and the number of input features can becontrolled by adjusting the depth of the tree -e top 7features selected by this method and their chi-squarevalues and P-values are shown in Table 3

To discuss the dependence of the proposed geologicalrecognition system on the amount of data in the trainingset 10 of the samples are randomly selected from theengineering datasets as the training set and theremaining 90 samples are used as an independent testset -e above feature selection results are used as inputfor the training classifiers to validate the recognitionaccuracy again using the independent test set -ecomputational costs of the training and predicting for

these four types of the classifier are compared -e resultsare shown in Table 4 For the SVM ANN and KNNclassifiers the computational time of the chi-square test isexcluded and the duration of each algorithm from thetraining set fitting to the test set prediction is measured Itshould be noted that the feature selection of CART isincluded in its training process Table 4 demonstrates thateven only 10 of the samples are used for training theoptimal feature combination selected by the feature se-lection algorithm still retains excellent recognition per-formance when 90 of the samples are used forprediction and validation

In the Tianjin Metro Line 9 the computational cost ofthe CART-based geological classifier is significantly smallerthan those of the other three -e prediction time of theKNN classifier is the longest and is significantly longer thanthose of the other three classifiers since KNN is an in-stance-based algorithm and the training process of KNN isonly a storing process Moreover each prediction requiresthe calculation of the distances between the point to bepredicted and all the sample points in the training setresulting in a longer prediction time with regard to a largeamount of data In addition for other classifiers theprediction of the test set is relatively fast after the training iscompleted

32 Generalization Ability of Geological Recognition Systems-e generalization ability is an important indicator toevaluate whether a learner has the overfitting phenomenawhich is a prerequisite for the practical application of theproposed method in engineering problems -e general-ization ability generally refers to the adaptability of themachine learning method for predicting the new data thatis whether a reasonable output can still be achieved whena dataset outside the training set is given In this workinstead of using the data from Tianjin Metro Line 9Tianjin Metro Line 3 is used as the engineering examplefor generalization validation -e statistical characteristicsof its dataset involved in the calculation are shown inTable 5

-is project and Tianjin Metro Line 9 were bothconstructed using the same EPB shield and they are lo-cated in the same city with similar geological conditionsIn this section the generalization of the geological rec-ognition system proposed in this paper is investigated byusing the geological recognition system established on thebasis of the in-situ data of the Tianjin Metro Line 9 projectto the geological recognition of Tianjin Metro Line 3which is another project independent of Tianjin MetroLine 9 Considering the difference of the parameters in

Table 2 Statistical properties of main parameters from the selected section of Tianjin Metro Line 9

Advance rate (mmmin) Cylinder thrust (kN) Cutter head torque (kNm) Cutter head rotational speed (rmin)Max 5568 2470168 170628 11Min 1684 1043607 80308 04Average 3153 1584043 126449 094SD 1004 290533 14871 014

2 4 6 8 10 12 14 16 180Number of feature

065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN

Figure 5 Identification accuracy of Tianjin Metro Line 9 with theincrease of input features


different engineering datasets of Line 9 and Line 3 thefeature selection method introduced in Section 22 is usedto select the important features in both engineeringdatasets as the input for the training of the geologicalrecognition system Based on the training set data fromTianjin Metro Line 9 three classification algorithms(ANN SVM and KNN) are used to establish the geo-logical recognition system -e recognition performanceis verified using the test set data from Line 3 -e eval-uation indicators for recognition performance are shownin Figure 6 -e geological recognition system trainedusing the engineering data of Line 9 can effectively rec-ognize the similar geology in the Line 3 project such asmuddy clay and silty clay -e recognition accuracies ofthe three types of classifiers are all above 90 Among theclassifiers the ANN outperforms the other two algo-rithms and algorithms based on KNN and SVM havesimilar prediction results -e results show that thegeological recognition system based on the existingtraining engineering data can give a reasonable outputwhen applied to new datasets from different projects withsimilar geological conditions demonstrating a good

Table 3 Chi-square value and P-value of top 7 features in Tianjin Metro Line 9

Name Chi-square value P-valueAccumulated rotation number of screw machine 491395 lt1eminus 275Primary voltage of high-voltage transformer 436052 lt1eminus 275Filling mud pressure no 2 369276 lt1eminus 275Initial pitch angle y 362604 lt1eminus 275Upper control limit of soil discharge quantity 333961 lt1eminus 275Standard deviation of soil discharge quantity 331144 lt1eminus 275Target value of soil pressure 316243 423eminus 275

Table 4 -e recognition performance of classifiers in Tianjin Metro Line 9 when the ratio of training and test set is 1 9

Accuracy () Training time (s) Test time (s)KNN 998 029 8145SVM 959 321 560ANN 973 576 003CART 999 003 002


Advance rate Cylinder thrust Cutter head torque Cutter head rotational speed(mmmin) (kN) (kNm) (rmin)

Max 3980 2744714 153993 08Min 18 1551125 103053 05Average 3153 2119853 127904 065SD 053 381417 8689 015

SVM KNN ANNClassifiers with different algorithms

00

02

04

06

08

10

Ave

rage

eval

uatio

n

PrecisionRecallF1-score

Figure 6 Recognition performance in Tianjin Metro Line 3 usinggeological classifiers trained by data from Tianjin Metro Line 9


generalization ability of the proposed recognitionsystems

4 Conclusions

-is paper proposes a method based on the in-situ datarecorded during TBM construction to conduct geologicaltype recognition -e main conclusions of this research canbe summarized as follows

(1) -e proposed method consists of feature engineeringand machine learning classification methods -erecognitionmethod based on the analysis of TBM in-situ data can effectively mine the internal influencelaw between variables and provide an effective way toobtain geological information for construction de-cision-making

(2) In feature engineering considering the disobedienceof TBM in-situ data to common distributions thechi-square test method is chosen for feature selec-tion Four machine learning classification algo-rithms ANN SVM KNN and CART are used forthe nonlinear coupling between features

(3) -e proposed method is applied to the geologicalrecognition of urban metro projects constructedwith EPB-TBM in China -e comparison betweenthe recognition results and the measured geologytypes shows that proposed method is effective -erecognition accuracy gradually increases with theincrease of input and eventually reaches a flat levelwhen the accuracy of all algorithms is higher than97 Based on this trend a selection strategy foroptimal input features is also given that the optimalnumber of input variables for this validation case isseven

(4) Studies regarding more advanced applications wouldbe worthwhile a database with more comprehensivegeological types (such as hard rock compositeground) is recommended to be established andanalyzed though the presented learning procedureMoreover intelligent visual interface could beconducted based on the proposed system for moreconvenient applications

Data Availability

-e data used in this paper are available from the relevantengineering enterprises which have not been released forcommercial reasons

Conflicts of Interest

All authors declare that there are no conflicts of interest

Acknowledgments

-is research was supported by the National Key RampDProgram of China (grant no 2018YFB1702505) NationalNatural Science Foundation of China (grant no 11872269)

and Natural Science Foundation of Tianjin (grant no18JCYBJC19600) Support from Prof Wei Guo in the Schoolof Computer Science and Technology of Tianjin Universityfor the research in this paper is greatly appreciated

References

[1] G Barla and S Pelizza TBM Tunnelling in Difficult GroundConditions Proceedings of the International Conference onGeotechnical amp Geological Engineering MelbourneAustralia pp 329ndash354 Technomic Publishing Company2000

[2] D J Armaghani M Koopialipoor A Marto and S YagizldquoApplication of several optimization techniques for esti-mating TBM advance rate in granitic rocksrdquo Journal of RockMechanics and Geotechnical Engineering vol 11 no 4pp 779ndash789 2019

[3] A K Agrawal V M S R Murthy and S ChattopadhyayaldquoInvestigations into reliability maintainability and availabilityof tunnel boring machine operating in mixed ground con-dition using Markov chainsrdquo Engineering Failure Analysisvol 105 pp 477ndash489 2019

[4] X Huang Q Liu K Shi Y Pan and J Liu ldquoApplication andprospect of hard rock TBM for deep roadway construction incoal minesrdquo Tunnelling and Underground Space Technologyvol 73 pp 105ndash126 2018

[5] M Entacher G Winter and R Galler ldquoCutter force mea-surement on tunnel boring machines-implementation atKoralm tunnelrdquo Tunnelling and Underground Space Tech-nology vol 38 no 3 pp 487ndash496 2013

[6] B Galler Q Guo Z Liu et al ldquoComprehensive aheadprospecting for hard rock TBM tunneling in complex lime-stone geology a case study in Jilin Chinardquo Tunnelling andUnderground Space Technology vol 93 Article ID 1030452019

[7] H Lan Y Xia Z Zhu Z Ji and J Mao ldquoDevelopment of on-line rotational speed monitor system of TBM disc cutterrdquoTunnelling and Underground Space Technology vol 57pp 66ndash75 2016

[8] B Tang H Cheng Y Tang et al ldquoExperiences of gripperTBM application in shaft coal mine a case study in Zhangjicoal mine Chinardquo Tunnelling and Underground SpaceTechnology vol 81 pp 660ndash668 2018

[9] J Li L Jing X Zheng P Li and C Yang ldquoApplication andoutlook of information and intelligence technology for safeand efficient TBM constructionrdquo Tunnelling and Under-ground Space Technology vol 93 p 103097 2019

[10] T Krause ldquoSchildvortrieb mit flussigkeits-und erdgestutzterOrtsbrustrdquo Technical University of BraunschweigBraunschweig Germany Dissertation 1987

[11] O T Blindheim Boreability Predictions for Tunneling -eNorwegian Institute of Technology Trondheim Norway1979

[12] A Bruland Hard Rock Tunnel Boring Faculty of EngineeringScience amp Technology vol 3 no 1 Male Maldives 2000

[13] Z Qian Y Kang Z Zheng and L Wang ldquoInverse analysisand modeling for tunneling thrust on shield machinerdquoMathematical Problems in Engineering vol 2013 Article ID975703 9 pages 2013

[14] E Avunduk and H Copur ldquoEmpirical modeling for pre-dicting excavation performance of EPB TBM based on soilpropertiesrdquo Tunnelling and Underground Space Technologyvol 71 pp 340ndash353 2018


[15] F J Macias P D Jakobsen Y Seo and A Bruland ldquoInfluenceof rock mass fracturing on the net penetration rates of hardrock TBMsrdquo Tunnelling and Underground Space Technologyvol 44 pp 108ndash120 2014

[16] G Armetti M R Migliazza F Ferrari A Berti andP Padovese ldquoGeological and mechanical rock mass condi-tions for TBM performance prediction -e case of ldquoLaMaddalenardquo exploratory tunnel Chiomonte (Italy)rdquo Tun-nelling and Underground Space Technology vol 77 pp 115ndash126 2018

[17] I M Vergara C Saroglou and C Saroglou ldquoPrediction ofTBM performance in mixed-face ground conditionsrdquo Tun-nelling and Underground Space Technology vol 69 pp 116ndash124 2017

[18] S Yagiz ldquoUtilizing rock mass properties for predicting TBMperformance in hard rock conditionrdquo Tunnelling and Un-derground Space Technology vol 23 no 3 pp 326ndash339 2008

[19] C Zhou T Kong Y Zhou H Zhang and L Ding ldquoUn-supervised spectral clustering for shield tunneling machinemonitoring data with complex network theoryrdquo Automationin Construction vol 107 Article ID 102924 2019

[20] D Bouayad and F Emeriault ldquoModeling the relationshipbetween ground surface settlements induced by shield tun-neling and the operational and geological parameters basedon the hybrid PCAANFIS methodrdquo Tunnelling and Un-derground Space Technology vol 68 pp 142ndash152 2017

[21] S Mahdevari S R Torabi and M Monjezi ldquoApplication ofartificial intelligence algorithms in predicting tunnel con-vergence to avoid TBM jamming phenomenonrdquo Interna-tional Journal of Rock Mechanics and Mining Sciences vol 55pp 33ndash44 2012

[22] K-C Hyun S Min H Choi J Park and I-M Lee ldquoRiskanalysis using fault-tree analysis (FTA) and analytic hierarchyprocess (AHP) applicable to shield TBM tunnelsrdquo Tunnellingand Underground Space Technology vol 49 pp 121ndash129 2015

[23] A Salimi J Rostami C Moormann and A Delisio ldquoAp-plication of non-linear regression analysis and artificial in-telligence algorithms for performance prediction of hard rockTBMsrdquo Tunnelling and Underground Space Technologyvol 58 pp 236ndash246 2016

[24] W Sun M Shi C Zhang J Zhao and X Song ldquoDynamicload prediction of tunnel boring machine (TBM) based onheterogeneous in-situ datardquo Automation in Constructionvol 92 pp 23ndash34 2018

[25] G Javad and T Narges ldquoApplication of artificial neuralnetworks to the prediction of tunnel boring machine pene-tration raterdquo Mining Science and Technology (China) vol 20no 5 pp 727ndash733 2010

[26] A C Adoko C Gokceoglu and S Yagiz ldquoBayesian predictionof TBM penetration rate in rock massrdquo Engineering Geologyvol 226 pp 245ndash256 2017

[27] S E Seker and I Ocak ldquoPerformance prediction of road-headers using ensemble machine learning techniquesrdquoNeuralComputing and Applications vol 31 no 4 pp 1103ndash11162019

[28] X Gao M Shi X Song C Zhang and H Zhang ldquoRecurrentneural networks for real-time prediction of TBM operatingparametersrdquo Automation in Construction vol 98 pp 225ndash235 2019

[29] S Li B Liu X Xu et al ldquoAn overview of ahead geologicalprospecting in tunnelingrdquo Tunnelling and Underground SpaceTechnology vol 63 pp 69ndash94 2017

[30] T Yamamoto S Shirasagi S Yamamoto Y Mito andK Aoki ldquoEvaluation of the geological condition ahead of the

tunnel face by geostatistical techniques using TBM drivingdatardquo Tunnelling and Underground Space Technology vol 18no 2-3 pp 213ndash221 2003

[31] C Zhou L Y Ding M J Skibniewski H Luo andH T Zhang ldquoData based complex network modeling andanalysis of shield tunneling performance in metro con-structionrdquo Advanced Engineering Informatics vol 38pp 168ndash186 2018

[32] R Kohavi ldquoA study of cross-validation and bootstrap foraccuracy estimation and model selectionrdquo in Proceedings ofthe International Joint Conference on Artificial IntelligenceQuebec Canada August 1995

[33] T Cover and P Hart ldquoNearest neighbor pattern classifica-tionrdquo IEEE Transactions on Informationeory vol 13 no 1pp 21ndash27 1967

[34] B E Boser I M Guyon and V N Vapnik ldquoA trainingalgorithm for optimal margin classifierrdquo in Proceedings of theWorkshop on Computational Learningeory Pittsburgh PAUSA July 1992

[35] M T Hagan M Beale and M Beale Neural Network DesignMIT Press Cambridge MA USA 2002

[36] L Breiman Classification and Regression Trees RoutledgeAbingdon UK 2017


poor service conditions and complex muddy environmentmake it very inconvenient to observe the construction stateof a TBM -erefore the monitoring data acquired andrecorded by various sensors loaded on the main parts of themachine form an important information basis for under-standing the working state of the equipment [5ndash8] CurrentTBMs can simultaneously monitor hundreds of operationparameters such as the tunneling rate cutter head rota-tional speed cylinder thrust cutter head torque and sealedchamber pressure -ese in-situ data contain rich infor-mation on interactions between the machine and thesurrounding environment -e rapid development oftechnologies such as big data and artificial intelligence inrecent years provides a more effective method and path forin-depth and sufficient exploration of information pro-vided by in-situ data to realize the informatization andintelligence of TBM construction [9]

In the early stage the engineering data of TBM wasused to establish some empirical models used in solvingpractical problems conveniently For example Krause [10]used the data of hundreds of TBM construction fromGermany and Japan to analyze and give the empiricalprediction range of tunneling load In addition classicempirical models include the NTNUmodel [11] developedby the Norwegian University of Science and Technologyand the improved model of Bruland [12] which are oftenused in engineering for the prediction of the rate ofpenetration On the basis of empirical models manyscholars have further given parameter prediction modelsbased on a statistical analysis of engineering data Forexample Zhang et al [13] established tunneling loadprediction model for the earth pressure balance tunnelboring machine (EPB-TBM) by combining regressionanalysis with dimensional analysis from engineering dataAvunduk and Copur [14] established a nonlinear re-gression model of rate of penetration by several soilproperty parameters such as particle size distribution andnatural water content Macias et al [15] analyzed thechange rule of prediction curve of rate of penetration of ahard rock TBM under different fracturing conditions andthe fracturing coefficient was determined as an effectiveindex of the influence of rock fracturing on tunnelingperformance -e single index affecting the rate of

penetration was regressed by Armetti et al [16] to analyzethe influence degree of different parameters in the em-pirical model on the tunneling performance Vergara andSaroglou [17] established the regression relationship be-tween the weighted rock mass rating and mixed-facepenetration index under the condition of mixed geologyconsidering the proportion of rock and soil in the tun-neling face Yagiz et al [18] set up regression equation topredict the tunneling performance under the condition ofjoint fault rock based on the rock properties such asdistance between planes of weakness and orientation ofdiscontinuities in rock mass -e works based on TBM in-situ data mainly focus on the basic statistical regression ofkey tunneling performance parameters with some limi-tations on the applicable problems and the number offeatures that can be considered Statistical regression canextract and describe the rules in the data But TBM is acomplex engineering system with hundreds of parameterscollected in the process Moreover TBM in-situ data areoften characterized by many influencing factors andnonlinear coupling between parameters which makes itdifficult for in-depth data mining [19] -e valuable in-formation hidden within the massive monitoring dataremains to be explored

In recent years machine learning algorithms havebeen developed rapidly Because of their excellent non-linear expression ability and adaptability to massive datathey provide powerful tools for TBM in-situ data analysisSome typical works are as follows Bouayad and Emeriault[20] established a prediction model of ground settlementcaused by shield machine based on earth pressure balancethrough the principal component analysis (PCA) andadaptive neuro-fuzzy inference system (ANFIS) Mah-devri et al [21] used a support vector machine and ar-tificial neural network to predict the tunnel convergencecaused by ground compression and verify the outputresults and measured data of the model through engi-neering examples Hyun et al [22] combined fault treeanalysis (FTA) and analytic hierarchy process (AHP) toanalyze the risk and probability of shield construction andconstructed a risk management system according to goodconsistency Salimi et al [23] used nonlinear regressionand artificial intelligence algorithms to predict the per-formance of the hard rock TBM Sun et al [24] establisheda model by random forest to predict the dynamic load ofshield tunneling Gholamnejad and Tayarani [25] used anartificial neural network to predict the rate of penetrationwith three rock mass parameters of uniaxial tensilestrength rock quality index and weak face spacing andtried to evaluate the results with different hidden-layersettings Adoko et al [26] proposed a Bayesian method toselect the performance of different tunneling machinesSeker and Ocak [27] compared the application effect ofrandom forest and other ensemble learning algorithms inthe prediction of the rate of penetration Gao et al [28]used several kinds of recurrent neural networks to analyzethe sequence rule of TBM performance parameters so asto predict the important performance parameters inadvance

Figure 1 Schematic diagram of TBM underground construction




2 Methods




Preprocessing

TBM in-situdata

Min-max normalized


Geological label

Start

End



Chi-square test




No

Yes






xpre x minus xmin

xmax minus xmin (1)





χ2 1113944k

i1


Ei

(2)












D

D1 D2 D3 D4 D5 D6 D7 D8 D9

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

D9

D1

D1 D2 D3 D4 D5 D6 D7 D8 D10

D10

D2 D3 D4 D5 D6 D7 D8 D9 D10


Evaluation 1

Evaluation 2

Evaluation 10



K = 8

K = 5


(a)

Support vectors

Support vectors


(b)

Inputlayer

Hiddenlayer

Outputlayer

x1

x2

y1

ynxn

(c)

If

If

IfResult

ResultResult

Result

Ture

Ture

Ture

False

False

False

(d)





AR TP + TN

TP + TN + FP + FN

PR TP

TP + FP

RE TP

TP + FN


(3)





















065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN












00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References










































2 Methods




Preprocessing

TBM in-situdata

Min-max normalized


Geological label

Start

End



Chi-square test




No

Yes






xpre x minus xmin

xmax minus xmin (1)





χ2 1113944k

i1


Ei

(2)












D

D1 D2 D3 D4 D5 D6 D7 D8 D9

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

D9

D1

D1 D2 D3 D4 D5 D6 D7 D8 D10

D10

D2 D3 D4 D5 D6 D7 D8 D9 D10


Evaluation 1

Evaluation 2

Evaluation 10



K = 8

K = 5


(a)

Support vectors

Support vectors


(b)

Inputlayer

Hiddenlayer

Outputlayer

x1

x2

y1

ynxn

(c)

If

If

IfResult

ResultResult

Result

Ture

Ture

Ture

False

False

False

(d)





AR TP + TN

TP + TN + FP + FN

PR TP

TP + FP

RE TP

TP + FN


(3)





















065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN












00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References









































xpre x minus xmin

xmax minus xmin (1)





χ2 1113944k

i1


Ei

(2)












D

D1 D2 D3 D4 D5 D6 D7 D8 D9

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

D9

D1

D1 D2 D3 D4 D5 D6 D7 D8 D10

D10

D2 D3 D4 D5 D6 D7 D8 D9 D10


Evaluation 1

Evaluation 2

Evaluation 10



K = 8

K = 5


(a)

Support vectors

Support vectors


(b)

Inputlayer

Hiddenlayer

Outputlayer

x1

x2

y1

ynxn

(c)

If

If

IfResult

ResultResult

Result

Ture

Ture

Ture

False

False

False

(d)





AR TP + TN

TP + TN + FP + FN

PR TP

TP + FP

RE TP

TP + FN


(3)





















065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN












00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References












































D

D1 D2 D3 D4 D5 D6 D7 D8 D9

D1 D2 D3 D4 D5 D6 D7 D8 D9 D10

D9

D1

D1 D2 D3 D4 D5 D6 D7 D8 D10

D10

D2 D3 D4 D5 D6 D7 D8 D9 D10


Evaluation 1

Evaluation 2

Evaluation 10



K = 8

K = 5


(a)

Support vectors

Support vectors


(b)

Inputlayer

Hiddenlayer

Outputlayer

x1

x2

y1

ynxn

(c)

If

If

IfResult

ResultResult

Result

Ture

Ture

Ture

False

False

False

(d)





AR TP + TN

TP + TN + FP + FN

PR TP

TP + FP

RE TP

TP + FN


(3)





















065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN












00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References










































AR TP + TN

TP + TN + FP + FN

PR TP

TP + FP

RE TP

TP + FN


(3)





















065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN












00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References


















































065

070

075

080

085

090

095

100

105

Accu

racy

SVMKNN

CARTANN












00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References

















































00

02

04

06

08

10

Ave

rage

eval

uatio

n





4 Conclusions






Data Availability




Acknowledgments



References









































4 Conclusions






Data Availability




Acknowledgments



References








































Geological Type Recognition by Machine Learning on In-Situ...

Documents

Transcript of Geological Type Recognition by Machine Learning on In-Situ...