Intelligent Inst

download Intelligent Inst

of 12

Transcript of Intelligent Inst

  • 8/13/2019 Intelligent Inst

    1/12

    Journal ofResearch ofthe National Bureau ofStandardsVolume 90, Number 6, November-December 1985

    ntelligent nstrumentationAlice M Harper

    University of Texas at El Paso, El Paso, TX 79968andShirley A. Liebman

    Aberdeen Proving Grounds, MD 21105,50666

    Accepted: July 1, 1985Feasibility studies on the application of multivariate statistical and mathematical algorithms to chemical problems

    have proliferated over the past 15 years. In contrast to this. most conunercia11y available computerized analyticalinstruments have used in the data systems only those algorithms which acquire, display, or massage raw data. Thesetechniques would fall into the preprocessing stage of sophisticated data analysis studies. An exception to this is, ofcourse, are the efforts of instrumental manufacturers in the area of spectral library search. Recent firsthand experienceswith several groups designing instruments and analytical procedures for which rudimentary statistical techniques wereinadequate have focused efforts on the question of multivariate data systems for instrumentation. That a sophisticatedand versatile mathematical data system must also e intelligent (not just a number cruncher) is an overriding consideration in our current development. For example, consider a system set up to perform pattern recognition. Either all usersneed to understand the interaction of data structures with algorithm type and assumptions or the data system must possesssuch an understanding. t would seem, in such cases, that the algorithm driver should include an expert systemsspecifically geared to mimic a chemometrician as well as one to aid interpretation in terms of the chemistry of a result.Three areas of modem analysis will be discussed: 1 developments in the area of preprocessing and pattern recognitionsystems for pyrolysis gas chromatography and pyrolysis mass spectrometry; 2) methods projected for the cross interpretation of several analysis techniques such as several spectroscopies on single samples; and 3) the advantages of havingweIl defined chemical problems for expert systems/pattern recognition automation.Key words: data systems, intelligent; instrumentation; multivariate algorithms, statistical and mathematical; patternrecognition; preprocessing; pyrolysis gas chromatography; pyrolysis mass spectrometry.

    Modern computer hardware and ~ o f t w a r e t e c h n o l o g i e ~have revolutionized the direction of analytical c h e m i ~ t r yover the past 15 y e a r ~ . Standard multivariate ~ t a t i ~ t i c a l tech,n i q u e ~ applied to optimization and control of n ~ t r u m e n t ation well routine decision making are at the forefrontof new instrumental methods such as biomedical 3,

    d i m e n ~ i o n a l ~ c a n n e r ~ and p y r o l y s i ~ MS and GC MS wellmore e ~ t a b l i ~ h e d m e a ~ u r e m e n t techniques. e ~ p i t e t h e ~ ea d v a n c e ~ little attention has been paid to the exploitation ofintelligent computerized instrumentation in the d e ~ i g n p h a ~ e

    of chemical research.Instrumental intelligence the ability of a ~ c i e n t i f i c in,strument to perform a ~ i n g l e or ~ e v e r a l intelligence func,About the Authors: Alice M. Harper is with the Chern,istry Department at the U n i v e r ~ i t y of T e x a ~ at El P a ~ o .

    Shirley A Liebman with the Ballistics R e ~ e a r c h Labora,tories at Aberdeen Proving Grounds.

    4S

    t i o n ~ in such a way that o p e r a t i o n ~ normally performed bythe ~ c i e n t i ~ t are completely under automated computer con,trol and d e c i ~ i o n making. Under this definition, intelligenti n ~ t r u m e n t s are quite common. Indeed in recent y e a r ~ man,u f a c t u r e r ~ of m a l l scientific equipment have used the termintelligent in conjunction with single purpose items suchas recorders to describe the addition of software andlor

    programmability to the device. Concurrent with this, largerscientific instruments have been marketed with data systemshosting a wide variety of intelligence functions includingcontrol and optimization of instrumental variables, optionalmodes of experimental design, signal averaging, fIlteringand integration as well as post analysis data massaging andlibrary search interpretation. Although instruments with thesoftware to perform sophisticated intelligence operationsexist, they are not so readily marketed as intelligent instru,ments. For example, modern pulsed Fourier transforms nu,clear magnetic resonance spectrometers (NMR) have micro'computers built into the system, operate over a wide range

  • 8/13/2019 Intelligent Inst

    2/12

    of NMR experimental designs, and control instrumentalparameters; however, decision making is, for the most partan operator based function. For this reason, such instruments have limited intelligence in comparison to the levelof intelligence required to carry an experiment to completion without extensive interaction with the operator. In fact,it could be that more intelligent research instruments mightalso be less versatile. The limitation is the current state oftechnology of intelligence programming. The instrument, ifused for routine analysis where problem statements can bewell defined, could operate with no loss of utility as a totallyautomated and intelligent instrument.

    Figure I gives insight into the problems arising in creatingintelligence programs for instrumentation. Figure la is ananalytical chemist's perception of a totally automated exper-

    a

    imental design [1]1. Af can be seen, the experiment remainsunspecified without an initial problem statement. The issueof intelligent data systems for single instruments is one ofeither defining the set of all possible problem statements inan evolutionary design or restricting the analysis to a singlewell defined problem. Figure Ib) is an example of a firststage multivariate data analysis system proposed for research applications in pyrolysis mass spectrometry [2]. Thedesign is one that leaves the problem statement, data interpretation and decision making entirely in the hands of thescientist. What is. really described in this figure is a statistical package residing on a microcomputer which receivesdata from the mass spectrometer. However, as Isenhour has

    Figures io brackets indkate literature references.

    I R O B L E M o ~ T T E M E N l _ EXPERIMENT M E S U R E ~ I E N T S SOLUTION' '- Ri HYPOTHES I S DATAI I I

    b. DATA ACQUISlTlON Ir PECTRAL NORl IALlZATI01i I

    UNIVARIATE & FACTOR ANALYSIS & ISUPERVISED IBIVARIATE r1ULTID [lIENS IONAL I PATTERNANALYSIS MAPPING ; RECOGNlTlON ,.

    G R P H I C S I ~c IltiSTRUr IENT 1

    ....EXPERIMENTAl CONTROL 1- OPWUZATION f- DIlIA ANALYS IS &DESIGN INTERPRETATIONEXPERT SYSTEM EXPERT SYSTEr1 EXPERT SYSTEM EXPERT SYSTEMKNOWLEDGE BASE KNOWLEDGE BAS KNOWLEDGE BAS KNOWLEDGE BASE

    d. I XPERT SYSTEMS ~EXPERIMENTAL DESIGN/

    NSTRUMENT 1 1 r NSTRUr iENT 2 ' . I I ~ S T R U M E r l T NCONTROL CONTROL CONTROLOPTIMIZATION i OPTIMIZATION OPTIMIZATIONPREPROCESS ING I PREPROCESSING PREPROCESSINGI I N T E R P R E T T I O ~ 1 INTERPRETATION INTERPRETATION

    DATA BASE MANAGEMENT EXPERT S Y S T E M ~DATA ANALYSIS INTERPRETmON

    5

    Figure l fouf designs for instru-ment data systems.a. Totally automated experimental

    design and decisfon making.b. Multivariate analysis for ~ p e c -tral instruments.c. Expert systems driven single

    purpose instrument.d. Laboratory automahon using

    expert systems drivers.

  • 8/13/2019 Intelligent Inst

    3/12

    demonstrated, multivariate factoring of spectral libraries offers advantages in the interpretation of complex single spectra [3]. t would seem that the incorporation of a multivariatestatistics package in the data system is a key element inintelligence programming for many instruments. Figure Ic)incorporates the expert system approach to intelligence forinstrumental systems. Under this design, the instrument canaccommodate a series of problem statements and decisionnetworks at various analysis stages and uses an intelligentdriver for the multivariate analysis and interpretive stages ofthe analysis. Figure ld) places such a system into the research laboratory controlling a variety of instruments andinterpreting results based on one or more analyses.

    Although the diagrams of figures lb-ld are not comprehensive designs for total automation, they do provide ahierarchy for linking problem statements and decision making into multivariate research problems. After a brief discussion of components of intelligence designs, an example willbe presented of the feasibility of developing expert systemdata reduction for pyrolysis analysis problems.Problem Statements: Ideally a problem statement isanalogous to the standard hypothesis in statistical analysis.Under the hypothesis a knowledge base can be collected andthe hypothesis tested. For example, a patient does or doesnot carry a genetic trait [4] . U nfortunatel y, chemical research problems are often ill-specified and the problemstatement may become a hierarchy of data investigationsleading to one or more problem statements. For example,I) are there differences in the chemical composit ion of aseries of samples? if differences do occur; 2 what are thenature of the differences?; 3) do the chemical differencescorrelate with observed changes in physical properties?; and4) can physical properties be predicted from the chemicaldifferences? [5,6].Knowledge Base: In order for a instrument to operate as anintelligent system, it must have the knowledge base necessary to arrive at a solution for each of its problem statements. This knowledge base may contain data, rules ( Masspeak 94 is phenol ), programs, and heuristic knowledge.

    Consider the knowledge base required for setting up andoperating routine analyses of polymer composition by pyrolysis gas chromatography. Possible problem statement areasinclude optimization of pyrolysis parameters, chromatographic conditions and interface characteristics, control ofinstrument and data acquisition parameters, data reduction, and data interpretation. The knowledge base mustinclude all information necessary to each of the proposedproblem statements. For optimization of parameters,rules governing the detection of an optimum, algorithms(e.g., simplex [7]) for efficiently moving toward anoptimum, rules for hierachical movements within the algorithm, rules for the detection of poor optimization surface structure, and representative previous optima datamight be employed in the decision making. Such optima

    455

    will be determined, in part, by the polymer degradationcharacteristics and therefore will not, in this case, e independent of the samples used in the analysis. Instrumentalcontrol might employ a knowledge base of rules for theautomation of events such as the initiation of sample pyrolysis and data collection. Data reduction for this method willrequire the rules and data necessary for baseline correction,chromatographic normalization, and peak matching. Theknowledge base requires a memory of previously collected data to aid peak matching protocols, rules for baselinedeterminations, transformations for baseline correction,normalization rules and algorithms, and rules for acceptanceor rejection of the chromatogram under consideration. Datainterpretation on the other hand might require a library ofprevious chromatograms as well as rules andlor algorithmsfor interpretation of the current event based on a knowledgeof past data with verified interpretations.

    Development of the knowledge base is the expensive andtime consuming operation in the development instrumentalintelligence even when the application is highly specific. tmust be remembered that each operation that a human mightperform automatically from experience must be programmed into the data system. For this reason, among theattributes of the system, there needs to be evolutionary operation. In other words, in addition to long term knowledge,facts about the current data and new conjectures under consideration must also be easily accommodated.Expert Systems Driven Multivariate Data Systems: Theresponse of many modern chemical instruments (e.g., spectrometers and chromatographs) is inherently multivariate.For such instruments, data reduction and interpretation oftenconsume a greater portion of analysis time than data collection, and requires scientific expertise. The time delay between data collection and decision making has become anacute problem for newer hyphenated techniques, such as gaschromatography-mass spectrometry and mass spectrometrymass spectrometry, which are capable of collecting thousands of mass spectral peaks in a short period of time.

    One possible solution to the problems imposed by largebodies of data is to incorporate into the instrument a datareduction system consisting of multivariate analysis methods. The problem encountered in the actual implementationof such a system is that few experts in instrumental analysishave the expertise for carrying multivariate statistical analyses. It has become increasing apparent that instruments employing versatile multivariate based data systems should becapable of operating in a transparent data analysis mode. Inorder to accomplish this, the expertise of a chemometricianwill need to be programmed into an expert systems driverfor the instrument data system. Given a problem statement,and a knowledge base of rules from previous experience, thecomputer could decide from a variety of possibilities how toreduce the data and represent the results in a meaningfulform. A relatively simplistic example of this is the problem

  • 8/13/2019 Intelligent Inst

    4/12

    statement Is there a correlation between two independentvariables? Invisible to the user would be the operations fordetermining the integrity of the variable distributions. acomputation o the correlation and a determination o thesignificance o the correlation coefficient o the relationship. The result might take a simple form such as thereappears to be a significant correlation between the first variable and the logarithm o the second variable. Would youlike to see the computed values?

    Demonstration o Expert System for DataAnalysis in Curie-point PyrolysisMass SpectrometryPyrolysis mass spectrometry has come to the attention o

    both mass spectrometrists and chemometricians because oits utility in the analysis o polymeric materials and thecomplexity o the mass spectra produced by natural polymers and biopolymers. The technique involves degradationo the solid material by pyrolysis followed by mass spectrometry o the pyrolysis fragments. It has been demonstrated by the pioneering work of Meuzelaar (for examplesee [8J) that Curie-point pyrolyses o samples o biomaterials can produce profiles which, when properly normalized[2J, are reproducible and diagnostic o the chemical similarities and dissimilarities among groups o samples and arequantitative under appropriate experimental designs [9]. Because the process o pyrolysis followed by mass spectrometry o the network polymer o natural heterogeneous biopolymers produces mass spectra with peaks which tend to behighly correlated, tne interpretation problems createdby the vast number of, and the overlap of, the masses aresolved through multivariate analysis o the mass spectralprofiles. A data system for a pyrolysis mass spectrometerwould be of limited utility without multivariate statisticalmethods [2].

    To test the hypothesis that an expert systems approach todata reduction is potentially helpful and feasible, an expertsystems driver was implemented to mimic the data analysisportion of a Rocky Mountain Coal study done at Biomaterials Profiling Center in Utah. Detailed results o the originalstudy can be found in [5.6] and o the numerical methodsused in [2]. Briefly, 102 Rocky Mountain Coals were analyzed in quadruplicate by pyrolysis mass spectrometry. Thepyrolysis profiles were added to a preexisting data set containing conventional measurements on the same coal sam-ples (see table 1 . After normalization o the profiles by themethod o Eshwis et at. [IOJ, average mass specta wereanalyzed using multivariate analysis techniques.Figure 2 is a minimal design for an expert systems drivendata acquisition and analysis system for pyrolysis massspectrometry. Figure 3 shows the design o the data basesfor the present appliC

  • 8/13/2019 Intelligent Inst

    5/12

    SAMPLE LOW ENERGY QUADRAPOLE ELECTRONPYROLYSIS ELECTRON IMPACT MASS FILTER MULTIPLIER

    ~ / /Figure 2-Expert systems dri\'en P)'- CONTROLrolysis- mass. spectrometer. Each - - - - - - - - - - - -stage of data acquisition. analysis, MASS CALIBRATIONI BAR SPECTRUMand interpretati.on has decision pro- NORMALIZATION OF SPECTRAL SETStocol ; based on a knowledge baseof an expert. Note that the interac- GRAPHICSt on between matbematical meth MAPPINGods and the e:x.pert system chemo-metrician. - - ~ 1 - - -

    1' PATTERtl

  • 8/13/2019 Intelligent Inst

    6/12

    samples. the peaks involved are temporarily deleted. thesamples previously deleted are brought back into analysisand the distances reexamined. The expert systems decidewhich will be deleted at this stage. peaks or samples basedon the recomputed distances. The iterative decision making-statistical computation continues until a key set ofpeaks remain activated for normalization over all spectra. Adiagram of the proposed normalization experts system isshown in figure 4.

    The system. as described. does not completely emulate anexpert for all applications of pyrolysis mass spectrometry. thas already been noted that the data evaluation does notaddress errors \vhich arise from time dependent or systematic error. In addition to this the decision making processwhile designed to emulate the human decision process basedon statistical results does not necessarily operate on a oneto-one correspondence with a human expert. The problem isthat two experts working on the same set of data may arriveat approximately the same results via slightly different procedural routes. The same is true of the expert system whencompared to a human expert. The major deviations from a

    nexti terat ion

    rI

    IProgram Init ia l izat ionSelect samplesSort i f necessaryEvaluate ion intensit ies1Co:npute :rtass variancesa. over al l sam?lesb within repl icatesCompl;-=e sample distancesa. Over al l samplesb. wicl1in repl icatesSort al l statis-: . icsForm F s ta t i s t ic rat ios1Expert system evall ationa. of mass variationsb of sample distances

    Expert system decisionsa. delete mass peaksb. C:elete samplesc. reactivate mass peaksd. reactivate samples1Norl1'.alization of spectraconvergence

    uit and save normalization

    Figure 4-Interaction of e;l[pert systems and statistical computation for aratiomtl nonnalization process or pyrolysis mass spectrometry.

    458

    human procedure found when working with this system isthe lack of intuition or fuzzy logic that is used by theexpert. The expert systems converges in more steps than isnecessary by human interaction with the statisticalgorithms and for some data bases. has trouble defining convergence at the solution. For example. the optimal cut offparameters for variance and distance change between dataset > These and other problelll5 are best solved by trainingthe expert systelll5 to recognize the structure of a goodspec tal soIution in addition to th structure of good statistics.

    Correlation Based Hypothesis TestingPerhaps the most often asked question of large data sets

    involves finding relationships between the variables or between the variables and an exlernal parameter. Table 2 liststhe form of the problem statements included in our datasystem for this demonstration. The term relate invokesone of several multivariate algorithms for the study of correlations in the data. The possible responses are the Pearsoncorrelation coefficient linear regression factor analysis rcanonical correlation analysis.

    Consider the problem statement: What is the relationshipof peak 34 (H,S) in the mass spectrum of coal to the totalorganic sulfur from the conventional data matrix. Relatecurrent data mass 34 to old data organic Ulfur over sam-ples. all result > in a computation of the correlation coefficients a estimate of significance and the confidence intervalabout the correlation. Relate currelll data, mass 34 to olddaw. organic. sulfur over samples, all; Interprete using olddata. organic sulfur results in the additional computationof (organic sulfur =a (mass 34 +b with residuals Ei . Theresidual pattern is tested for randomness. A failure results ina search through the library for a reference residual patternwith similarities to the computed residual pattern.

    The next relate function asks for a study of the relationships among variables in the current data set. Relate cur-rent data, all with current data. all over samples. allresults in the factor analysis of the data correlation matix.The loadings of the factor analysis are interpreted for thevariable relationships seen along the orthogonal axes of theoriginal rotation. This interpretation is an experts systems.based analysis of the major peak series and will be discussedlater.Interpretation of the factor score is a difficult problem.Consider the two dimensional factor score projection ofthese data. Figure Sa is a projection without labels. Figure5b is the same projection after a human expert has assignedsample labels corresponding to the geological source of thesamples and an interpretation. Figure Sa is representative ofthe information about the factor scores stored by the computer. The addition of labels is readily accomplished but.without tmining patterns formed by the sample labels

  • 8/13/2019 Intelligent Inst

    7/12

    Table 2. Variations of the expert system commands RELATE and INTERPRETE USING. The FORTRAN subroutine calling protocols arebased on the data types used in each variable location of the command and on the command sequence. Note that the same command stringsand rules can accomodate conlputation on questions about the data base transpose matrix when .. OVER samples ... is replaced by

    ... OVER variables...

    RELATE data base 1, variable list 1 TO data base 2, variable list 2 OVER samples, sample listExamples:I RELATE current data, mass 34 TO old data, organic sulfur OVER samples, match

    (Results: correlation coefficient and significance test)2. RELATE current data, mass 34 TO old data, organic sulfur OVER samples. match; INTERPRETE USING old data, organic sulfur

    (Results: correlation computed at least squares fit, significance test, and residual pattern evaluation)3. RELATE current data, all TO current data, all OVER samples, all

    (Results: Factor analysis of current data and loading interpretation)4. RELATE current data, all TO old data, calorific value OVER samples, match, active; INTERPRETE USING old data, calorific value

    (Results: Factor analysis of current data followed by target rotation to calorific value and interpretation of loadings)5. RELATE current data, all TO old data, all OVER samples, match

    (Results: Canonical correlation analysis of two data bases and interpretation of mass spectral loadings)

    within the space have little meaning. A generalized solutionto this problem does not seem likely. Each study wouldrequire an elaborate knowledge base specific to the samplesin order to interpret trends seen in this picture.

    For the coal data, the old data matrix of conventionalmeasurements provides a more easily implemented route forhypothesis testing on the pyrolysis mass spectral factors.Consider once more the relate command. After matchingthe two data sets by the logic used in [5], factor analysis ofthePY/MS set followed by Relate current data 2, all withold data, calorific value using samples, active; interpretusing old data, calorific value results in the regressionanalysis of calorific value=: W (factor scores)py. +b pro-ducting both the variable and the sample relationships of therotation of the Py/MS data to calorific value. The results aregiven in figure 6 and discussed in [6].

    The last example of a correlation based statement isRelate current data, all with old data, all over samples,all. This results in a cannonical correlation analysis factor-ing both data matrics in such a way that the overlap ininformation between the sets is maximized. Four factors areextracted. The first two of these are shown in figures 7through 9. This analysis showed these data sets to be ap-proximately 80% correlated over the coal samples. Themajor chemistry of the conventional measurement trends aregiven above the spectral representation form the Py/MSfactors.Figure 5-Comparison of computer representation of factor scores (a.) to

    the same plot after interpretation by an expert (b.) Symbols representsample specific relationships deduced by the expert. Dotted lines are aseparation of the samples into classes not available to the computer in thepyrolysis data base. The knowledge base required to mimic the expertwould fonn a reference book of information about the samples. Thegeneration of such a knowledge base would be a fonnitable task.

    459

    z

    w''-'N

    ' 0.00I-'-'.. l .0 Ie

    -z.oZ.O - .

    C':'lI:iELK l ~ ;

    l J

    w''-'N .:.;'I-'-'... ---'

    . 1.0 ~ ~ ~ ; o-z.o

    ( a )... ...M-

    ~ ~ - '.\ , '..

    o.u 1.0

    ( b )SUOB I r o

    1.0 0.0 1.0FACTOR 1 SCORE

  • 8/13/2019 Intelligent Inst

    8/12

    ""

    E(,,\V i /0 0

    0

    - O ~

    .1 lkenes .1lkyl frolgmenls::.l 7. H-"'" - - - - - - - - - - -" '- __ 84 _0 , 0 190 ZOO, - ~== ': :6 _ I I ~ ___ - '176s ;iu '(586 r- :- }2 '4 78 9 4 f ''''''r '"__ - - I ~ Z naphthots124 indano lst phenolke t one s be nz e ne I be nz o l u ra ns ( indanes)

    d ih y d r o x Yb e n z e n e solrboxy olcidshdcoho l s )

    Figure 6-Chemical interpretation of the mass spectral peaks associated strongly with a targeted least squares rotation of pyrolysis factor scores tothe external parameter calorific value. Peak interpretations were accomplished by the system described in figure 10. Results on chemicalinterpretation are identical to those of the original study. demonstrating that chemical infonnation is more readily implimented as an expert systemthan sample specific infonnation for this instrumental method.

    (+)

    o( - )

    R(Z,AcXc = R(Z,ApXpl = 0.99R(AcXc,ApXpl = 0.95CONVENT ONAl MEASUREMENT SETCALORIFIC VALUE -

    ORGANIC OXYGEN +MOISTUREORGANIC CARBONREFLECTANCE

    +

    PYROLYSIS MASS SPECTRAL DATA SET

    I,ll/1 2 , ~ 154 1- - ; ' , , /'--tOl''''ZVL'142' l /\198I J J~ W I FRAGMENTS 113ALKEtlES " .." ...

    56 iO 1811 tlAPIHHALEUES,I 1 11 ''1''''1 11 I Ii i I Ii I I I' I' iii Iii I 'I' I I'iiliil 1 11' I ' I iilii'l 'i 'i '1 liil lii 1 1 , i [ II I tlz 50 60 70 ao 9G 100 110 120 130 140 150 160 170 laO 110 200 210 220 230

    Figure 7-Firs t canonical variate loadings for the rotation of pyrolysis mass spectra of coal samples to the conventional data matrix described intablel. The correlation of the data bases to the derived variate (Z) is 0.99 and to each other is 0.95. AcXc and ApXp are the linear composites ofthe conventional and pyrolysis data bases respectively. Only the signs of the strongly correlated variables of the conventional data are given.Loadings for the pyrolysis data are given as positive and negative. Interpretations of mass peak were accomplished using the system described infigure 10.

    460

  • 8/13/2019 Intelligent Inst

    9/12

    I ii

    R(Z,AcXcl = R(Z,ApXpl =0.96R(AcXc,ApXpl = 0.85CONVENTIONAL MEASURE/lENT SET

    SEMIFUSINITESODIUM +VOLATILES +REFLECTANCE

    PYROLYSIS MASS SPECTRAL DATA SET

    ISOPRENOID SIGNALS 1188 202159 173; ; r - - , -_ I I\1 'II'I II I, ii 'I'I' I: I

    : I, i , IiI,I I, I, , 'II :\'11 1111 '1111'1I d 120 11134 I / 1 56Vah1ZENES w:

    92 128 1142NAPHTHALEtIESPHE'IIlJITHRE'IES

    IANTHRACENES 1801''''1''1'' ,,:,1, i 'i f'''i ', 1 ,' 'I''' lI i , ' I ' ' ' , ' 15411/Z;O 50 60 70 V 90 100 110 120 130 HO 150 160 170 IBO 190 200 210 220 2jQ

    I1

    I11

    IiI

    ---Figure 8-Second canonical variate loadings for the rotation of pyrolysis mass spectra to the conventional data matrix described in table 1. See figure

    7 for an explanation of the symbols used in this figure. Interpretation by the system described in figure 10 failed for the positive loadings labeledas isoprenoid signals? These were assigned by the authors.

    Figure 9-Canonical variate scoreson the first two axes (described infigs. 7 and 8). Dotted lines separat-ing classes were inserted by the au-thors and not interpreted by the ex-pert system (see fig. 5 for anexplanation of implimentationproblem for sample interpretation.)

    .HIGH VOLATILEBITUMINOUS /A

    .'

    J

    HIGH VOLATILESnUMIHOUS B

    461

    IIJJII . ,I

    . 1eI I ,

    . SUBBITUMINOUSI ,HIGH VOLATILE BITUMINOUS'C I

    FIRST CAlIOItICAl COMI ONEKT

  • 8/13/2019 Intelligent Inst

    10/12

    The assignment of chemical interpretations for mass spectra and for factor loadings is accomplished by an expertsystems intepreter that can be used for any study in whichthe samples are coal. The data base for the interpretationincludes commonly encountered chemical species in coal,the major peaks expected from their presence and, given twochemical species with similar patterns, the probability oftheir contribution as a major component to a coal pattern.Also included is a routine for generating molecular speciesfrom C,N ,0, and S given the base peak molecular weightand the ion series. The operation of the expert system alongwith the results of each iteration are given in figure 10.Because Py-MS spectra contain many low intensity massesof questionable interpretation, and because factor loadingsare rarely pure, only the most intense ion series patterns areinterpreted. The program is initialized by setting an initialthreshold limit (TL3). The ions mlz) above the thresholdare collected, sorted and given temporary chemical assignments. The threshold limit is lowered each time by a lesserfactor until it encounters the grassy region of the spectrum. At this point, pennanent ion series interpretations areassigned. The ? appearing in the figure for Mass 104means that no library interpretation of this peak was foundand that the number of peaks in the ion series was below thelimit set for generation of hypothetical molecular species.The entire system is diagramed in figure II.

    DiscussionThe correlation work described in this paper along withsimilar considerations of other algorithms seem to support

    the possibility that, for a given instrument, an expert systems driver for data analysis can be developed which isindependent of the nature of the chemical problem. The wayto accomplish this is to build expert systems drivers tointerpret the problem statement and to interpret the dataanalysis results. Viewed in this manner, an instrumentdependent expert system can generate the experimental design and optmization from the problem statement and passto the data analysis driver the statistical elements necessaryfor its decision on the proper data reduction protocol. Learning is initiated when the data analysis system receives anunrecognizable set of elements. Otherwise, the expert system selects from the knowledge base the algorithm sequencefor data reduction.

    We are currently extending these concepts to the construction of an expert system, EXMAT, for experimentaldeSign, optimization, data reduction and interpretation ofmeasurements made on a sample by a variety of thermalanalysis instrumental techniques. TIMM (General Research Corp., Mclean, VA) a FORTRAN-based expertsystem generator, has been enabled development of aheuristically-linked set of expert systems for material analy-

    462

    6 109 ~ ~ ~ 1 1-; ... 211 III I,

    ... .3 \ (T1 )2s $ - \\ III I,'.- , i' ( I )0i .iN t l -I' i r ill i;:;n,

    (TJ)J 4 60 80 100 120 140 160 160 200 220m"COLLECTED 'lASSES: 2 ' C 2 1 f ~ 42 C3H5SORTED "ASSES:

    1T112SORTED ".ASSES:

    CTI1 0SORTED MSSES:

    BASE .'lASS

    C7H10 CSliGO lOS CgH12 ' 7H80110 CSHI4 ' C611602 CgH lG ' C7H80228, 112 CNH2N94.108 C N I i 2 M ~ 4 ' (CSH61OCNHN110.124 CNH21l-2 {C6H6 lD:1Si1zN

    28. 4294.:08.122110.124

    2E. ~ 2 . 56. 70

    4468. 82SER ES

    (Till28, 42. 56 9 ~ l O 8 l 2 2

    110.124.139

    9 2 . 1 0 6 . 1 2 0 , 1 3 ~ . 1 4 8 . l 6 294.108.122.136lC4110.124,1381 3 2 1 ~ 6 1 6 0158

    PR /'IARV A S S G N ~ \ E N T28 28. 42. 56. 70 CNH2N ALKENE32 32

    104llOm158

    68. 8292.106,120.134.148,162911.108.122.135

    104110.124.138

    1 3 2 1 ~ 6 1 6 0

    CNH2N +2O ALCOHOLH,CNH1N+2 ALKANECNH2N _2

    D[ENE(C5HS C N H 2 N ~ ~ K Y L -BEltZENE(C6111; lCNH2N +1OH

    ALKYL -PHENOL

    (C6H31CNH2N+I02H2DIHYDROXVBENZENE

    1 DANS .BENZOFURANSHAPIfTl{OLS /1'IDENES

    Figure IO-Operation of the expert system used to interprete the pyrolysismass spectra and spectral loadings in the demonstration. The rules usedto sort and interprete the peaks are based on ion series produced by massdifferences of 14 (CH2). Base peaks help tenninate series for resolutionof multiple interpretations. ? is a peak not present in the data base.

    sis. The attributes of the TIMM system are listed in table3. In order to accomplish the analytical goals of this project,we have combined the concepts of specific instrumentalintelligence with the goals set forth in figure la to providea data system capable of experimental designs utilizing oneor several analysis techniques. Each instrument retains itsown control, design, preprocessing and interpretative expertsystems unit but the data analysis unit has, of necessity,been generalized for analysis of data from a single or from

  • 8/13/2019 Intelligent Inst

    11/12

    Figure ll-Steps in the expert system interpretation on pyrolysismass spectra of coals.

    SET THRESHOLDINTERVALS TI) COLLECT PEAKSIN TI)max

    ELIMINATEFORHULAS

    SORT P;2:AKS FORALKYL LOSS

    TIGENERATEFINAL STRUCTURES

    Table 3. Attributes of the TIMMR expert system. TIMMR was adapted to our application because addition of FORTRAN subroutine librarycalls is more readily accomplished than in other systems and because the decision logic can already use computationally based rules aswell as chaining logic rules.

    TIMMR: A FORTRAN - BASED EXPERT SYSTEMS APPLICATIONS GENERATOR, General Research CorporationForward/backward chaining using analog rather than propositional representationKnowledge base divided into two sections: declarative knowledge and knowledge bodyPattern-matching using a nearest neighbors search algorithm to compare current situation with antecedent clausesUnique similarity metric computed fonn order infonnation in declarative knowledge giving distance metric over all classesDecision structure and knowledge body readily developed and modified by expert in any domainHeuristically-linked expert system using implicit and explicit method pennitting processing of microdecisions that are part of

    macrodecisionsEach system independently built, trained, exercised checked for consistancy and completeness and then generalized

    multiple measurement techniques. Table 4 gives a outline ofthe decision making process for the experimental designexpert system, and for data interpretation. Rule generationunder the system is demonstrated in figure 12. The expertsystem strategy for the chemometrics portion of the systemis similar to that described previously in the Py-MS examplewith the added feature that the system generate the protocolof analysis based on the initial problem statement and theavailable data. The system is in its earliest stages of development so any or all aspects of the proposed design aresubject to modifications as experience is gained in trainingthe system. Nevertheless, we feel that an expert systemssuch as ours offers a strategy for automation of laboratoryinstrumentation and interpretation under an expert systemsapproach.g U

    I f SCOPE IS R DSAI1PLE AMT IS TRACESAMPLE FORM IS POWDERSAMPLE PROCESS ISSAMPLE HISTORY ISINSTR. AVAIL IS ALLThen:ANAL STRATEGY IS SPECTRMS (1 QO)Figure,12-Example of rule generation in EXMAT using the TIMM

    expert system. Rule is from the expert system for analytical strategy.

    463

    Table 4. Example taken from overall organization structure ofEXMAT, an expert systems for materials analysis.

    ANALYTICAL STRATEGYCHOICES:CHROMGC

    CHROMLCSPECTRFrIRSPECTRMSTHERMTAELEMEL

    FACTORS:I SCOPE

    QUALQUANTPURITYQUAUQUANTTIME/FUND LIMITTRACER&DCORRELATIONSCREEN

    DATAINTERPRETATON

    CHOICES:PATTERN RECOGNITIONSAMPLE IDGROUP IDPEAKS IDNO IDCOMBINE DBEXTEND DBCORRELATIONFACTORS:I DATA GENERATE

    FrIR DBMSDBLC DBTADBGC-FrJR DBGC-MS DB

    . GCDB2. GC DB TREATMENT

    DIRECT COMPARECHEMOMETRICSDATA SET ID

    3. FrIR DB TREATMENTDIRECT COMPAREPAIRS SEARCH

    4. ETC. FOR EACHDATA BASE (DB)

  • 8/13/2019 Intelligent Inst

    12/12

    References[I] Harper, A.M., Polymer Characterization Using Gas Chromatography

    and Pyrolysis, S. Liebman and E.J. Levy, Marcel Dekker. Inc.(1985).[2] Harper. A.M.; H.C.L. Meuzelaar, G.S. Metcalf, and D.L. Pope,Analytical Pyrolysis Techniques and Applications, K. Voorhees,ed. 157-195 (1984).

    [3] Owens, P.M.; R.B. Lamb and T.L. Isenhour. Anal. Chern., S4 344(1982).

    [4] McMurry, J.E.; J.A. Pino, P,C. Jues B. Lavine, and A.M. Harper,Anal. Chern., vol. 57, 295-302 (1985).

    DISCUSSIONof the Harper-Liebman paper, IntelligentInstrumentationRichard J BeckmanLos Alamos National Laboratory

    There has been an instrumentation revolution in thechemical world which has changed the way both chemistsand statisticians think. Instrumentation has lead chemists tomultivariate data-much multivariate data. Gone are thedays when the chemist takes three univariate measurementsand discards the most outlying.

    Faced with these large arrays of data the chemist canbecome somewhat lost in the large assemblage of multivariate methods available for the analysis of the data. t isextremely difficult for the chemist-and the statistician forthat matter-to form hypotheses and develop answers aboutthe chemical systems under investigation when faced withlarge amounts of multivariate chemical data,

    Professor Harper proposes an intelligent insttument tosolve the problem of the analysis and interpretation of thedata. This machine will perform the experiments, formulatethe hypotheses, and "understand" the chemical systemsunder investigation.

    What impact will such an instrument have on bothchemists and statisticians? For the chemist, such an instrument will allow more time for experimentation, more timeto think about the chemical systems under investigation, abetter understanding of the system, and better statistical andnumerical analyses, There would be a chemometrician in

    [5] Meuzelaar, H.L.e. and A.M. Harper, Characterization and Classification o Rocky Mountain Coals by Curie-Point Mass Spectrometry, Fuels, vol. 63, 639-652 1984).

    [6] Harper, A.M.; H.L.e. Meuzelaar, and P.H. Given, Fuels, vol. 63,793-799 (1984).

    [7] Deming, S.N. ; S.L. Morgan, and M.R. Willcott, American Laboratory. October, 13 1976).

    [8] Meuzelaar, H L e ; 1. Haverkamp, and F.D. Hileman, Curie PointPyrolysis Mass Spectrometry of Recent and Fossil Biomaterials.Elsevier, Amsterdam 1982).[9] Windig, W.; P.O. Kistemaker. and J Haverkamp, J Anal. App .

    Pyro ., 3 99 (1982).[10] Eshwis. W.; P.O. Kistemaker and H.L.e. Meuzelaar, Analytical

    Pyrolysis. CER, Jones, ed., Elsevier, Amsterdam, 151-166(1977).

    every instrument For the statistician, the instrument willmean the removal of outliers, trimmed data, automated'regressions, and automated multivariate analyses. Most important, the entire model building process will be automated,

    There are some things to worry about with intelligentinstruments. Will the chemist know how the data have beenreduced and the meaning of the analysis? Instruments madetoday do some data reduction, such as calibration and trimming, and the methods used in this reduction are seldomknown by the chemist. With a totally automated system thechemist is likely to know less about the analysis than he doeswith the systems in use today.

    The statistician when reading the paper of ProfessorHarper probably asks what is the role of the statistician inthis process? Will the statistician be replaced with a microchip? an the statistician be replaced with a microchip?In my view the statistician will be replaced by a microchipin instruments such as those discussed by Professor Harper,This will happen with or without the help of the statistician,but it is with the statistician's help that good statisticalpractices will be part of the intelligent insttument.

    Professor Harper should be thanked for her view of thefuture chemical laboratory . This is an exciting time for boththe chemist and the statistician to work and learn together,

    464