Incorporating Topic Assignment Constraint and Topic Correlation...

14
Research Article Incorporating Topic Assignment Constraint and Topic Correlation Limitation into Clinical Goal Discovering for Clinical Pathway Mining Xiao Xu, Tao Jin, Zhijie Wei, and Jianmin Wang School of Software, Tsinghua University, Beijing, China Correspondence should be addressed to Tao Jin; [email protected] Received 4 November 2016; Revised 24 January 2017; Accepted 5 February 2017; Published 22 May 2017 Academic Editor: Evangelos Sakkopoulos Copyright © 2017 Xiao Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Clinical pathways are widely used around the world for providing quality medical treatment and controlling healthcare cost. However, the expert-designed clinical pathways can hardly deal with the variances among hospitals and patients. It calls for more dynamic and adaptive process, which is derived from various clinical data. Topic-based clinical pathway mining is an eective approach to discover a concise process model. Through this approach, the latent topics found by latent Dirichlet allocation (LDA) represent the clinical goals. And process mining methods are used to extract the temporal relations between these topics. However, the topic quality is usually not desirable due to the low performance of the LDA in clinical data. In this paper, we incorporate topic assignment constraint and topic correlation limitation into the LDA to enhance the ability of discovering high-quality topics. Two real-world datasets are used to evaluate the proposed method. The results show that the topics discovered by our method are with higher coherence, informativeness, and coverage than the original LDA. These quality topics are suitable to represent the clinical goals. Also, we illustrate that our method is eective in generating a comprehensive topic-based clinical pathway model. 1. Introduction Clinical pathway (CP) is of great importance in disease treatment, hospital management, and government regula- tion. It has been widely used to make the treatment process more standardized and normalized. The core principle of CP is to divide the clinical treatment into several stages and detail the clinical activities in each stage. Now, there are more than 300 CPs which have been designed by experts and implemented in China. Table 1 is a Chinese CP example about intracerebral hemorrhage (ICH). However, these CPs can be hardly put into practical application due to their static and nonadaptive feature. Recently, clinical pathway mining (CPM) has experienced increased attention, which refers to using data mining tech- nologies to discover the execution CP from hospital historical data. In comparison with the expert-designed CP, the data- driven execution CP is more objective. It can facilitate the CP (re)design, CP implementation, and abnormal detection. The most current CPM approaches are based on the process mining technologies, which are popular in extracting process models from event logs. The resultant graph models contain two basic elements: nodes represent clinical activities and edges represent the order relations between these activi- ties. However, the clinical data are too unstructured and complex for process mining methods. They usually result in spaghetti-like models which are comprised by massive nodes and edges. To make the process model more comprehensive, researchers attempt to manually predene the key activities for process mining. It is eective in promoting the model conciseness and highlighting core information, while the manual works are always expensive and subjective. Some other approaches focus on using topic modeling methods to discover clinical patterns instead of process models. They assume that the treatment for a disease can be concluded into a number of patterns. For a patient, a doctor would choose one pattern for him/her according to the symptoms. To achieve this target, topic modeling Hindawi Journal of Healthcare Engineering Volume 2017, Article ID 5208072, 13 pages https://doi.org/10.1155/2017/5208072

Transcript of Incorporating Topic Assignment Constraint and Topic Correlation...

Page 1: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

Research ArticleIncorporating Topic Assignment Constraint and TopicCorrelation Limitation into Clinical Goal Discovering for ClinicalPathway Mining

Xiao Xu Tao Jin Zhijie Wei and Jianmin Wang

School of Software Tsinghua University Beijing China

Correspondence should be addressed to Tao Jin jintao05gmailcom

Received 4 November 2016 Revised 24 January 2017 Accepted 5 February 2017 Published 22 May 2017

Academic Editor Evangelos Sakkopoulos

Copyright copy 2017 Xiao Xu et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Clinical pathways are widely used around the world for providing quality medical treatment and controlling healthcare costHowever the expert-designed clinical pathways can hardly deal with the variances among hospitals and patients It calls formore dynamic and adaptive process which is derived from various clinical data Topic-based clinical pathway mining is aneffective approach to discover a concise process model Through this approach the latent topics found by latent Dirichletallocation (LDA) represent the clinical goals And process mining methods are used to extract the temporal relations betweenthese topics However the topic quality is usually not desirable due to the low performance of the LDA in clinical data In thispaper we incorporate topic assignment constraint and topic correlation limitation into the LDA to enhance the ability ofdiscovering high-quality topics Two real-world datasets are used to evaluate the proposed method The results show that thetopics discovered by our method are with higher coherence informativeness and coverage than the original LDA These qualitytopics are suitable to represent the clinical goals Also we illustrate that our method is effective in generating a comprehensivetopic-based clinical pathway model

1 Introduction

Clinical pathway (CP) is of great importance in diseasetreatment hospital management and government regula-tion It has been widely used to make the treatment processmore standardized and normalized The core principle ofCP is to divide the clinical treatment into several stages anddetail the clinical activities in each stage Now there are morethan 300 CPs which have been designed by experts andimplemented in China Table 1 is a Chinese CP exampleabout intracerebral hemorrhage (ICH) However these CPscan be hardly put into practical application due to their staticand nonadaptive feature

Recently clinical pathwaymining (CPM) has experiencedincreased attention which refers to using data mining tech-nologies to discover the execution CP from hospital historicaldata In comparison with the expert-designed CP the data-driven execution CP is more objective It can facilitate theCP (re)design CP implementation and abnormal detection

The most current CPM approaches are based on theprocess mining technologies which are popular in extractingprocess models from event logs The resultant graph modelscontain two basic elements nodes represent clinical activitiesand edges represent the order relations between these activi-ties However the clinical data are too unstructured andcomplex for process mining methods They usually result inspaghetti-like models which are comprised by massive nodesand edges To make the process model more comprehensiveresearchers attempt to manually predefine the key activitiesfor process mining It is effective in promoting the modelconciseness and highlighting core information while themanual works are always expensive and subjective

Some other approaches focus on using topic modelingmethods to discover clinical patterns instead of processmodels They assume that the treatment for a disease canbe concluded into a number of patterns For a patient adoctor would choose one pattern for himher accordingto the symptoms To achieve this target topic modeling

HindawiJournal of Healthcare EngineeringVolume 2017 Article ID 5208072 13 pageshttpsdoiorg10115520175208072

technologies such as latent Dirichlet allocation (LDA) [1]and its variants are applied on the clinical data [2ndash5] Theymodel each patient trace as mixtures of multiple topics whileeach topic is modeled as a multinomial distribution overclinical activities However these approaches can hardlyreflect the temporal structure between clinical activitieswhich are important for CP

In our previous work [6] we take the advantages ofboth topic modeling and process mining to generate a

concise and interpretable topic-based process model Thekey principle is to use LDA to discover the latent topicsfrom the clinical data and extract the order relationsbetween these topics by process mining methods It is suit-able to regard the latent topics as the clinical goals based onthe following two clinical practices

(1) The clinical activities occurring in a specific timeduration (usually a hospitalized day) are prescribedaround several clinical goals

(2) Each clinical goal corresponds to a clinicalactivity set

Take the first day of ICH as the example (as shown inTable 1) For a new patient the doctors would make adiagnosis for himher by evaluating the intracranial pressure(ICP) and try to control the ICP into the normal range Theseconstitute the clinical goals for the first day To achieve thesegoals doctors have a series of corresponded choices BrainCT and the related medical imaging are suitable for theformer goal Dehydration drugs like mannitol and glycerinfructose are very useful for the latter goal Thus each clinicalday would be represented as several topics instead of thedetailed clinical activities Moreover the process miningmethod is effective in discovering the temporal structure inthese high-level sequences

However the raw LDA can hardly ensure the quality ofthe discovered topics which is critical to the interpretabilityof the final process model By carefully analyzing the clinicalscenario we find two common problems in applying rawLDA on clinical data

(1) The same clinical activities occurring in one clinicalday may be assigned different topics

(2) A clinical activity may have strong correlation tomany topics

We demonstrate two examples to explain them (1) InFigure 1(a) the four penicillin prescribed by the doctor inone clinical day are assigned to four different topics whilein the clinical scenario they are both used for the sameclinical goal which means that they should share the sametopic (2) In Figure 1(b) the clinical activity syringe rankshigh in all the three discovered topics However each clinicalactivity should have relative limited number clinical goalsrather than various clinical goals

To address the above problems we proposed a novelextension of LDA for CPM which is called Clinical DailyGoal LDA (CDG-LDA) As shown in Figure 2 we firstintroduce a constraint into LDA which can ensure thesame clinical activity in one clinical day would be assignedto the same topic (topic assignment constraint) It canimprove the consistency of the topic assignment procedurefor clinical goal discovering Second we present an effi-cient method to adaptively limit the topic number of eachclinical activity and adjust its ranking in the topicaccording to their informativeness (topic correlation lim-itation) It guarantees that a clinical activity would onlyrank high in limited clinical topics Third an effective

Table 1 The national clinical pathway of intracerebral hemorrhagereleased by the Ministry of Health of China

Stage Order

Stage 1(Day 1)

Long-term medical order

(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed and (5) observingvital signs

Temporary medical order

(1) blood urine and stool routine examination(2) hepatorenal function electrolytes blood glucoseblood lipids cardiac enzymes coagulation functionand infectious disease screening (3) brain CT chestX-ray and ECG and (4) when necessary brain MRICTA MRA or DSA

Stage 2(Day 2)

Long-term medical order

(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory valuesand (2) when necessary re-examination CT

Stage 3(Day 3)

Long-term medical order

(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory values

Stage 4(Days 4ndash6)

Long-term medical order

(1) neurology nursing routine (2) level II care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory values(2) re-examination for blood kidney functionblood glucose and electrolytes and (3) whennecessary re-examination CT

Stage 5(Days 7ndash13)

Long-term medical order

(1) neurology nursing routine (2) level II-III care(3) normal diet (4) observing vital signs and(5) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory valuesand (2) when necessary DSA CTA and MRA

Stage 6(Days 8ndash14)

Long-term medical order

(1) discharge with drugs

2 Journal of Healthcare Engineering

process mining framework is applied on the topic-basedsequences to get a comprehensive process model It isworth mentioning that we use the billing data as ourexperimental data Compared to other various kinds ofclinical data billing data is the most common and cred-ible one due to its importance in insurance area And itcan be easily extracted from hospital information systemwithout complex data integration technologies Billingdata is also of the lowest data granularity and containsa lot of noise which brings a great difficulty for CPMtask Thus the CPM approach on the billing data isboth valuable and challenged

Our main contributions in this paper are as follows

(i) We incorporate topic assignment constraint intoLDA to make the inference procedure consis-tent for the same clinical activities in oneclinical day

(ii) Informativeness is introduced to adjust theranking in each topic which can improve thetopic quality

(iii) Experiments on real-world datasets demonstrate thegreatperformanceofourapproach forCPMbyapply-ing process mining on the quality discovered topics

We first give the definitions used in this paper

Definition 1 (clinical activity) A clinical activity is an eventoccurring at a particular time point which refers to the basicelement in treatment

Definition 2 (clinical day and patient trace) A patient tracecontains a series of clinical activities We segment eachpatient trace into a collection of clinical days and a clinicalday represents a set of clinical activities occurring in the samehospitalized day

2 Related Work

Early in 2001 Lin et al [7] pointed out the shortage of theexpert-designed CPs in handling the great variances causedby individual differences So that a more dynamic and

Penicillin penicillin penicillin penicillin and others

Antibacterial drugImaging

Nursing care

Blood exam

Topic imaging

(1) CT(2) Syringe(3) MRI

Topic nursing

(1) Syringe(2) Mouth care(3) Skin care

Topic drug

(1) Syringe(2) Penicillin(3) Vitamin

⦙ ⦙ ⦙

(a) (b)

Figure 1 Two problems in applying LDA on clinical data (a) the clinical activities in one day for a patient and (b) the three discovered topics

Visit ID Item name Amount Occurrence time

T1T1T1T1T1

T2T2

ABACD

BC

32125

42

20146122014612201461220146122014613

20151132015113

⦙ ⦙ ⦙ ⦙

⦙ ⦙ ⦙ ⦙

⦙ ⦙

⦙ ⦙⦙ ⦙

⦙ ⦙

A A A B B A C C

D D DD D

B B BB CC

Day 1

Day 2

Day 1

G2G1 G3T1 Topic 1 A 06 B 03 and others

Topic 2 C 07 D 01 and others

Topic 3 B 04 C 03 and others

Goal 1 imaging CT MRI and others

Goal 2 blood examCreatinine sodium and others

Goal 3 medication Penicillin vitamin and others

Data preparation Topic modeling Process mining

T1

Goal 1 Goal 2

Day 1 Day 2hellip

T2

Goals 3 and 1 Goal 1

Day 1 Day 2hellip

T1

Topic 1 Topic 2

Day 1 Day 2hellip

T2

Topics 3 and 1 Topic 1

Day 1 Day 2hellip

Busin

ess d

omai

nA

lgor

ithm

dom

ain

Pipeline

a1 a2 a3 a4 a5

T2

VOC

ABCD

Goals 3 and 1

Goal 1

Goal 2

Topic assignment constraint Topic correlation limitation

Topics 3 and 1

Topic 1

Topic 2

Figure 2 The illustrative process of our approach The top part is the pipeline middle part is the business domain and bottom part is thecorresponding algorithm domain (T patient trace a clinical activity day clinical day G group VOC vocabulary)

3Journal of Healthcare Engineering

adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario

Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM

Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA

In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating

the topic assignment constraint and topic correlation limita-tion into LDA

3 Methodology

We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data

31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words

32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows

(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD

(a) Draw distribution θdsimDir α

(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd

(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi

According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular

4 Journal of Healthcare Engineering

inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows

P Z A α β = P Z α sdotP A Z β

= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ

prop prodK

k=1prodD

d=1Γ N k

d + αk sdotprodV

v=1Γ N vk + βv

Γ Nk +V

v=1βv

1where Γ middot is the gamma function

Given the joint distribution the topic assignment foreach clinical activity can be written as follows

P zdi = k Z zdi A = P Z AP Z zdi A

propN v

k zdi + βv

V

v=1Nvk zdi + βv

sdot N kd zdi + αk

2

After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows

ϕ vk = N v

k + βv

V

v=1Nvk + βv

θ kd = N k

d + αk

K

k=1Nkd + αk

3

where ϕ vk is the probability of clinical activity v which

belongs to topic k and θ kd is the probability of the dth clinical

day which contains topic k

33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice

In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to

Table 2 Meanings of the notations

Notation Meaning

D The set of clinical days

A The set of clinical activities

Z The set of topics assigned to A

D K The number of clinical days and topics

V The number of unique clinical activities

Nd The number of clinical activities in dth clinical day

NkThe number of clinical activities that are assigned

topic k

GdThe number of groups (unique clinical activities)

in dth clinical day

Agd

The number of clinical activities in gth group indth clinical day

adi The ith clinical activities in dth clinical day

zdi The topic for adiagd The clinical activities of gth group in dth clinical day

agdj The jth clinical activity in gth group in dth clinical day

zgdj The topic for agdj

zgdzgdj

Agd

j=1 the set of topics for gth group in dth

clinical day

N kd

The number of clinical activities that are assignedtopic k in dth clinical day

N vk

The number of clinical activities with value v and topic k

N kd zgd

The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day

N vk zgd

The number of clinical activities with value v andtopic k except gth group in dth clinical day

α β Dirichlet prior vector

ϕ θ Multinomial distribution over clinical activitiesand topics

ϕkMultinomial distribution over clinical activities for

topic k

θd Multinomial distribution over topics for dth clinical day

휃d

k

zdi

adi

i

d

kisin[1 K] [1 Nd][1 D]

isin

isin

d

k

d1

d1

d

k [1 K]

d2

d2

ds

ds

s= d

g g g

g g g

g [1Gd][1D]

isin isin

isin

(a) (b)

Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA

5Journal of Healthcare Engineering

take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg

d is imposed to express thetopic assignment constraint as follows

C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg

d

0 otherwise4

For each group Cgd would equal to 1 only when all the

clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg

d in the joint distribution of LDA (1) as follows

Pprime Z A = 1Δ P Z A prod

dgC zgd 5

where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we

sample a topic for a group based on its posteriorPprime zgd = k Z zgd

A where zgd = k represents the set zgd which

has the same topic k (the notations are explained in Table 2)

Pprime zgd = k Z zgd A = P Z A

P Z zgd A

propΓ N k

d + αk

Γ N kd zgd

+ αk

sdotΓ N

αgd

k + βαgd

Γ Nk +V

vβv

Γ Nαgd

k zgd+ βα

gd

Γ Nk zgd+V

vβv

=Γ N k

d zgd+ Ag

d + αk

Γ N kd zgd

+ αk

sdotΓ N

agdk zgd

+ Agd + βα

gd

Γ Nagd

k zgd+ βα

gd

sdotΓ Nk zgd

+V

vβv

Γ Nk zgd+ Ag

d +V

vβv

= prodAgd

j=1N k

k zgd+ αk + jminus 1

sdotN

αgd

k zgd+ βα

gd+ jminus 1

Nk zgd+V

vβv + jminus 1

6

34 Topic Correlation Limitation The ranking of a clinical

activity v in a topic k is determined by ϕ vk From (3) it is

observed thatN vk is critical to the calculation of ϕ v

k In addi-

tion the sum of N vk among all topics is a fixed value which

equals to the frequency of clinical activities v (denoted as

N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals

We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects

(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics

(2) Associate the calculation of ϕ vk with the informative-

ness of v

We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking

IDF v = log Dd isin 1D v isinDd

7

Accordingly the new calculation of ϕ vk is denoted

as follows

ϕvk = ϕ v

k sdot IDF v 8

The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1

The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in

RV k v = N vk

rank k v 9

where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator

N vk reflects the relevant degree of topic k to clinical activity

6 Journal of Healthcare Engineering

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

technologies such as latent Dirichlet allocation (LDA) [1]and its variants are applied on the clinical data [2ndash5] Theymodel each patient trace as mixtures of multiple topics whileeach topic is modeled as a multinomial distribution overclinical activities However these approaches can hardlyreflect the temporal structure between clinical activitieswhich are important for CP

In our previous work [6] we take the advantages ofboth topic modeling and process mining to generate a

concise and interpretable topic-based process model Thekey principle is to use LDA to discover the latent topicsfrom the clinical data and extract the order relationsbetween these topics by process mining methods It is suit-able to regard the latent topics as the clinical goals based onthe following two clinical practices

(1) The clinical activities occurring in a specific timeduration (usually a hospitalized day) are prescribedaround several clinical goals

(2) Each clinical goal corresponds to a clinicalactivity set

Take the first day of ICH as the example (as shown inTable 1) For a new patient the doctors would make adiagnosis for himher by evaluating the intracranial pressure(ICP) and try to control the ICP into the normal range Theseconstitute the clinical goals for the first day To achieve thesegoals doctors have a series of corresponded choices BrainCT and the related medical imaging are suitable for theformer goal Dehydration drugs like mannitol and glycerinfructose are very useful for the latter goal Thus each clinicalday would be represented as several topics instead of thedetailed clinical activities Moreover the process miningmethod is effective in discovering the temporal structure inthese high-level sequences

However the raw LDA can hardly ensure the quality ofthe discovered topics which is critical to the interpretabilityof the final process model By carefully analyzing the clinicalscenario we find two common problems in applying rawLDA on clinical data

(1) The same clinical activities occurring in one clinicalday may be assigned different topics

(2) A clinical activity may have strong correlation tomany topics

We demonstrate two examples to explain them (1) InFigure 1(a) the four penicillin prescribed by the doctor inone clinical day are assigned to four different topics whilein the clinical scenario they are both used for the sameclinical goal which means that they should share the sametopic (2) In Figure 1(b) the clinical activity syringe rankshigh in all the three discovered topics However each clinicalactivity should have relative limited number clinical goalsrather than various clinical goals

To address the above problems we proposed a novelextension of LDA for CPM which is called Clinical DailyGoal LDA (CDG-LDA) As shown in Figure 2 we firstintroduce a constraint into LDA which can ensure thesame clinical activity in one clinical day would be assignedto the same topic (topic assignment constraint) It canimprove the consistency of the topic assignment procedurefor clinical goal discovering Second we present an effi-cient method to adaptively limit the topic number of eachclinical activity and adjust its ranking in the topicaccording to their informativeness (topic correlation lim-itation) It guarantees that a clinical activity would onlyrank high in limited clinical topics Third an effective

Table 1 The national clinical pathway of intracerebral hemorrhagereleased by the Ministry of Health of China

Stage Order

Stage 1(Day 1)

Long-term medical order

(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed and (5) observingvital signs

Temporary medical order

(1) blood urine and stool routine examination(2) hepatorenal function electrolytes blood glucoseblood lipids cardiac enzymes coagulation functionand infectious disease screening (3) brain CT chestX-ray and ECG and (4) when necessary brain MRICTA MRA or DSA

Stage 2(Day 2)

Long-term medical order

(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory valuesand (2) when necessary re-examination CT

Stage 3(Day 3)

Long-term medical order

(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory values

Stage 4(Days 4ndash6)

Long-term medical order

(1) neurology nursing routine (2) level II care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory values(2) re-examination for blood kidney functionblood glucose and electrolytes and (3) whennecessary re-examination CT

Stage 5(Days 7ndash13)

Long-term medical order

(1) neurology nursing routine (2) level II-III care(3) normal diet (4) observing vital signs and(5) basic drugs

Temporary medical order

(1) re-examination for abnormal laboratory valuesand (2) when necessary DSA CTA and MRA

Stage 6(Days 8ndash14)

Long-term medical order

(1) discharge with drugs

2 Journal of Healthcare Engineering

process mining framework is applied on the topic-basedsequences to get a comprehensive process model It isworth mentioning that we use the billing data as ourexperimental data Compared to other various kinds ofclinical data billing data is the most common and cred-ible one due to its importance in insurance area And itcan be easily extracted from hospital information systemwithout complex data integration technologies Billingdata is also of the lowest data granularity and containsa lot of noise which brings a great difficulty for CPMtask Thus the CPM approach on the billing data isboth valuable and challenged

Our main contributions in this paper are as follows

(i) We incorporate topic assignment constraint intoLDA to make the inference procedure consis-tent for the same clinical activities in oneclinical day

(ii) Informativeness is introduced to adjust theranking in each topic which can improve thetopic quality

(iii) Experiments on real-world datasets demonstrate thegreatperformanceofourapproach forCPMbyapply-ing process mining on the quality discovered topics

We first give the definitions used in this paper

Definition 1 (clinical activity) A clinical activity is an eventoccurring at a particular time point which refers to the basicelement in treatment

Definition 2 (clinical day and patient trace) A patient tracecontains a series of clinical activities We segment eachpatient trace into a collection of clinical days and a clinicalday represents a set of clinical activities occurring in the samehospitalized day

2 Related Work

Early in 2001 Lin et al [7] pointed out the shortage of theexpert-designed CPs in handling the great variances causedby individual differences So that a more dynamic and

Penicillin penicillin penicillin penicillin and others

Antibacterial drugImaging

Nursing care

Blood exam

Topic imaging

(1) CT(2) Syringe(3) MRI

Topic nursing

(1) Syringe(2) Mouth care(3) Skin care

Topic drug

(1) Syringe(2) Penicillin(3) Vitamin

⦙ ⦙ ⦙

(a) (b)

Figure 1 Two problems in applying LDA on clinical data (a) the clinical activities in one day for a patient and (b) the three discovered topics

Visit ID Item name Amount Occurrence time

T1T1T1T1T1

T2T2

ABACD

BC

32125

42

20146122014612201461220146122014613

20151132015113

⦙ ⦙ ⦙ ⦙

⦙ ⦙ ⦙ ⦙

⦙ ⦙

⦙ ⦙⦙ ⦙

⦙ ⦙

A A A B B A C C

D D DD D

B B BB CC

Day 1

Day 2

Day 1

G2G1 G3T1 Topic 1 A 06 B 03 and others

Topic 2 C 07 D 01 and others

Topic 3 B 04 C 03 and others

Goal 1 imaging CT MRI and others

Goal 2 blood examCreatinine sodium and others

Goal 3 medication Penicillin vitamin and others

Data preparation Topic modeling Process mining

T1

Goal 1 Goal 2

Day 1 Day 2hellip

T2

Goals 3 and 1 Goal 1

Day 1 Day 2hellip

T1

Topic 1 Topic 2

Day 1 Day 2hellip

T2

Topics 3 and 1 Topic 1

Day 1 Day 2hellip

Busin

ess d

omai

nA

lgor

ithm

dom

ain

Pipeline

a1 a2 a3 a4 a5

T2

VOC

ABCD

Goals 3 and 1

Goal 1

Goal 2

Topic assignment constraint Topic correlation limitation

Topics 3 and 1

Topic 1

Topic 2

Figure 2 The illustrative process of our approach The top part is the pipeline middle part is the business domain and bottom part is thecorresponding algorithm domain (T patient trace a clinical activity day clinical day G group VOC vocabulary)

3Journal of Healthcare Engineering

adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario

Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM

Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA

In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating

the topic assignment constraint and topic correlation limita-tion into LDA

3 Methodology

We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data

31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words

32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows

(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD

(a) Draw distribution θdsimDir α

(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd

(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi

According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular

4 Journal of Healthcare Engineering

inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows

P Z A α β = P Z α sdotP A Z β

= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ

prop prodK

k=1prodD

d=1Γ N k

d + αk sdotprodV

v=1Γ N vk + βv

Γ Nk +V

v=1βv

1where Γ middot is the gamma function

Given the joint distribution the topic assignment foreach clinical activity can be written as follows

P zdi = k Z zdi A = P Z AP Z zdi A

propN v

k zdi + βv

V

v=1Nvk zdi + βv

sdot N kd zdi + αk

2

After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows

ϕ vk = N v

k + βv

V

v=1Nvk + βv

θ kd = N k

d + αk

K

k=1Nkd + αk

3

where ϕ vk is the probability of clinical activity v which

belongs to topic k and θ kd is the probability of the dth clinical

day which contains topic k

33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice

In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to

Table 2 Meanings of the notations

Notation Meaning

D The set of clinical days

A The set of clinical activities

Z The set of topics assigned to A

D K The number of clinical days and topics

V The number of unique clinical activities

Nd The number of clinical activities in dth clinical day

NkThe number of clinical activities that are assigned

topic k

GdThe number of groups (unique clinical activities)

in dth clinical day

Agd

The number of clinical activities in gth group indth clinical day

adi The ith clinical activities in dth clinical day

zdi The topic for adiagd The clinical activities of gth group in dth clinical day

agdj The jth clinical activity in gth group in dth clinical day

zgdj The topic for agdj

zgdzgdj

Agd

j=1 the set of topics for gth group in dth

clinical day

N kd

The number of clinical activities that are assignedtopic k in dth clinical day

N vk

The number of clinical activities with value v and topic k

N kd zgd

The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day

N vk zgd

The number of clinical activities with value v andtopic k except gth group in dth clinical day

α β Dirichlet prior vector

ϕ θ Multinomial distribution over clinical activitiesand topics

ϕkMultinomial distribution over clinical activities for

topic k

θd Multinomial distribution over topics for dth clinical day

휃d

k

zdi

adi

i

d

kisin[1 K] [1 Nd][1 D]

isin

isin

d

k

d1

d1

d

k [1 K]

d2

d2

ds

ds

s= d

g g g

g g g

g [1Gd][1D]

isin isin

isin

(a) (b)

Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA

5Journal of Healthcare Engineering

take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg

d is imposed to express thetopic assignment constraint as follows

C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg

d

0 otherwise4

For each group Cgd would equal to 1 only when all the

clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg

d in the joint distribution of LDA (1) as follows

Pprime Z A = 1Δ P Z A prod

dgC zgd 5

where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we

sample a topic for a group based on its posteriorPprime zgd = k Z zgd

A where zgd = k represents the set zgd which

has the same topic k (the notations are explained in Table 2)

Pprime zgd = k Z zgd A = P Z A

P Z zgd A

propΓ N k

d + αk

Γ N kd zgd

+ αk

sdotΓ N

αgd

k + βαgd

Γ Nk +V

vβv

Γ Nαgd

k zgd+ βα

gd

Γ Nk zgd+V

vβv

=Γ N k

d zgd+ Ag

d + αk

Γ N kd zgd

+ αk

sdotΓ N

agdk zgd

+ Agd + βα

gd

Γ Nagd

k zgd+ βα

gd

sdotΓ Nk zgd

+V

vβv

Γ Nk zgd+ Ag

d +V

vβv

= prodAgd

j=1N k

k zgd+ αk + jminus 1

sdotN

αgd

k zgd+ βα

gd+ jminus 1

Nk zgd+V

vβv + jminus 1

6

34 Topic Correlation Limitation The ranking of a clinical

activity v in a topic k is determined by ϕ vk From (3) it is

observed thatN vk is critical to the calculation of ϕ v

k In addi-

tion the sum of N vk among all topics is a fixed value which

equals to the frequency of clinical activities v (denoted as

N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals

We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects

(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics

(2) Associate the calculation of ϕ vk with the informative-

ness of v

We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking

IDF v = log Dd isin 1D v isinDd

7

Accordingly the new calculation of ϕ vk is denoted

as follows

ϕvk = ϕ v

k sdot IDF v 8

The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1

The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in

RV k v = N vk

rank k v 9

where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator

N vk reflects the relevant degree of topic k to clinical activity

6 Journal of Healthcare Engineering

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

process mining framework is applied on the topic-basedsequences to get a comprehensive process model It isworth mentioning that we use the billing data as ourexperimental data Compared to other various kinds ofclinical data billing data is the most common and cred-ible one due to its importance in insurance area And itcan be easily extracted from hospital information systemwithout complex data integration technologies Billingdata is also of the lowest data granularity and containsa lot of noise which brings a great difficulty for CPMtask Thus the CPM approach on the billing data isboth valuable and challenged

Our main contributions in this paper are as follows

(i) We incorporate topic assignment constraint intoLDA to make the inference procedure consis-tent for the same clinical activities in oneclinical day

(ii) Informativeness is introduced to adjust theranking in each topic which can improve thetopic quality

(iii) Experiments on real-world datasets demonstrate thegreatperformanceofourapproach forCPMbyapply-ing process mining on the quality discovered topics

We first give the definitions used in this paper

Definition 1 (clinical activity) A clinical activity is an eventoccurring at a particular time point which refers to the basicelement in treatment

Definition 2 (clinical day and patient trace) A patient tracecontains a series of clinical activities We segment eachpatient trace into a collection of clinical days and a clinicalday represents a set of clinical activities occurring in the samehospitalized day

2 Related Work

Early in 2001 Lin et al [7] pointed out the shortage of theexpert-designed CPs in handling the great variances causedby individual differences So that a more dynamic and

Penicillin penicillin penicillin penicillin and others

Antibacterial drugImaging

Nursing care

Blood exam

Topic imaging

(1) CT(2) Syringe(3) MRI

Topic nursing

(1) Syringe(2) Mouth care(3) Skin care

Topic drug

(1) Syringe(2) Penicillin(3) Vitamin

⦙ ⦙ ⦙

(a) (b)

Figure 1 Two problems in applying LDA on clinical data (a) the clinical activities in one day for a patient and (b) the three discovered topics

Visit ID Item name Amount Occurrence time

T1T1T1T1T1

T2T2

ABACD

BC

32125

42

20146122014612201461220146122014613

20151132015113

⦙ ⦙ ⦙ ⦙

⦙ ⦙ ⦙ ⦙

⦙ ⦙

⦙ ⦙⦙ ⦙

⦙ ⦙

A A A B B A C C

D D DD D

B B BB CC

Day 1

Day 2

Day 1

G2G1 G3T1 Topic 1 A 06 B 03 and others

Topic 2 C 07 D 01 and others

Topic 3 B 04 C 03 and others

Goal 1 imaging CT MRI and others

Goal 2 blood examCreatinine sodium and others

Goal 3 medication Penicillin vitamin and others

Data preparation Topic modeling Process mining

T1

Goal 1 Goal 2

Day 1 Day 2hellip

T2

Goals 3 and 1 Goal 1

Day 1 Day 2hellip

T1

Topic 1 Topic 2

Day 1 Day 2hellip

T2

Topics 3 and 1 Topic 1

Day 1 Day 2hellip

Busin

ess d

omai

nA

lgor

ithm

dom

ain

Pipeline

a1 a2 a3 a4 a5

T2

VOC

ABCD

Goals 3 and 1

Goal 1

Goal 2

Topic assignment constraint Topic correlation limitation

Topics 3 and 1

Topic 1

Topic 2

Figure 2 The illustrative process of our approach The top part is the pipeline middle part is the business domain and bottom part is thecorresponding algorithm domain (T patient trace a clinical activity day clinical day G group VOC vocabulary)

3Journal of Healthcare Engineering

adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario

Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM

Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA

In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating

the topic assignment constraint and topic correlation limita-tion into LDA

3 Methodology

We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data

31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words

32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows

(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD

(a) Draw distribution θdsimDir α

(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd

(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi

According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular

4 Journal of Healthcare Engineering

inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows

P Z A α β = P Z α sdotP A Z β

= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ

prop prodK

k=1prodD

d=1Γ N k

d + αk sdotprodV

v=1Γ N vk + βv

Γ Nk +V

v=1βv

1where Γ middot is the gamma function

Given the joint distribution the topic assignment foreach clinical activity can be written as follows

P zdi = k Z zdi A = P Z AP Z zdi A

propN v

k zdi + βv

V

v=1Nvk zdi + βv

sdot N kd zdi + αk

2

After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows

ϕ vk = N v

k + βv

V

v=1Nvk + βv

θ kd = N k

d + αk

K

k=1Nkd + αk

3

where ϕ vk is the probability of clinical activity v which

belongs to topic k and θ kd is the probability of the dth clinical

day which contains topic k

33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice

In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to

Table 2 Meanings of the notations

Notation Meaning

D The set of clinical days

A The set of clinical activities

Z The set of topics assigned to A

D K The number of clinical days and topics

V The number of unique clinical activities

Nd The number of clinical activities in dth clinical day

NkThe number of clinical activities that are assigned

topic k

GdThe number of groups (unique clinical activities)

in dth clinical day

Agd

The number of clinical activities in gth group indth clinical day

adi The ith clinical activities in dth clinical day

zdi The topic for adiagd The clinical activities of gth group in dth clinical day

agdj The jth clinical activity in gth group in dth clinical day

zgdj The topic for agdj

zgdzgdj

Agd

j=1 the set of topics for gth group in dth

clinical day

N kd

The number of clinical activities that are assignedtopic k in dth clinical day

N vk

The number of clinical activities with value v and topic k

N kd zgd

The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day

N vk zgd

The number of clinical activities with value v andtopic k except gth group in dth clinical day

α β Dirichlet prior vector

ϕ θ Multinomial distribution over clinical activitiesand topics

ϕkMultinomial distribution over clinical activities for

topic k

θd Multinomial distribution over topics for dth clinical day

휃d

k

zdi

adi

i

d

kisin[1 K] [1 Nd][1 D]

isin

isin

d

k

d1

d1

d

k [1 K]

d2

d2

ds

ds

s= d

g g g

g g g

g [1Gd][1D]

isin isin

isin

(a) (b)

Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA

5Journal of Healthcare Engineering

take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg

d is imposed to express thetopic assignment constraint as follows

C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg

d

0 otherwise4

For each group Cgd would equal to 1 only when all the

clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg

d in the joint distribution of LDA (1) as follows

Pprime Z A = 1Δ P Z A prod

dgC zgd 5

where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we

sample a topic for a group based on its posteriorPprime zgd = k Z zgd

A where zgd = k represents the set zgd which

has the same topic k (the notations are explained in Table 2)

Pprime zgd = k Z zgd A = P Z A

P Z zgd A

propΓ N k

d + αk

Γ N kd zgd

+ αk

sdotΓ N

αgd

k + βαgd

Γ Nk +V

vβv

Γ Nαgd

k zgd+ βα

gd

Γ Nk zgd+V

vβv

=Γ N k

d zgd+ Ag

d + αk

Γ N kd zgd

+ αk

sdotΓ N

agdk zgd

+ Agd + βα

gd

Γ Nagd

k zgd+ βα

gd

sdotΓ Nk zgd

+V

vβv

Γ Nk zgd+ Ag

d +V

vβv

= prodAgd

j=1N k

k zgd+ αk + jminus 1

sdotN

αgd

k zgd+ βα

gd+ jminus 1

Nk zgd+V

vβv + jminus 1

6

34 Topic Correlation Limitation The ranking of a clinical

activity v in a topic k is determined by ϕ vk From (3) it is

observed thatN vk is critical to the calculation of ϕ v

k In addi-

tion the sum of N vk among all topics is a fixed value which

equals to the frequency of clinical activities v (denoted as

N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals

We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects

(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics

(2) Associate the calculation of ϕ vk with the informative-

ness of v

We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking

IDF v = log Dd isin 1D v isinDd

7

Accordingly the new calculation of ϕ vk is denoted

as follows

ϕvk = ϕ v

k sdot IDF v 8

The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1

The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in

RV k v = N vk

rank k v 9

where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator

N vk reflects the relevant degree of topic k to clinical activity

6 Journal of Healthcare Engineering

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario

Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM

Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA

In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating

the topic assignment constraint and topic correlation limita-tion into LDA

3 Methodology

We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data

31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words

32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows

(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD

(a) Draw distribution θdsimDir α

(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd

(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi

According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular

4 Journal of Healthcare Engineering

inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows

P Z A α β = P Z α sdotP A Z β

= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ

prop prodK

k=1prodD

d=1Γ N k

d + αk sdotprodV

v=1Γ N vk + βv

Γ Nk +V

v=1βv

1where Γ middot is the gamma function

Given the joint distribution the topic assignment foreach clinical activity can be written as follows

P zdi = k Z zdi A = P Z AP Z zdi A

propN v

k zdi + βv

V

v=1Nvk zdi + βv

sdot N kd zdi + αk

2

After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows

ϕ vk = N v

k + βv

V

v=1Nvk + βv

θ kd = N k

d + αk

K

k=1Nkd + αk

3

where ϕ vk is the probability of clinical activity v which

belongs to topic k and θ kd is the probability of the dth clinical

day which contains topic k

33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice

In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to

Table 2 Meanings of the notations

Notation Meaning

D The set of clinical days

A The set of clinical activities

Z The set of topics assigned to A

D K The number of clinical days and topics

V The number of unique clinical activities

Nd The number of clinical activities in dth clinical day

NkThe number of clinical activities that are assigned

topic k

GdThe number of groups (unique clinical activities)

in dth clinical day

Agd

The number of clinical activities in gth group indth clinical day

adi The ith clinical activities in dth clinical day

zdi The topic for adiagd The clinical activities of gth group in dth clinical day

agdj The jth clinical activity in gth group in dth clinical day

zgdj The topic for agdj

zgdzgdj

Agd

j=1 the set of topics for gth group in dth

clinical day

N kd

The number of clinical activities that are assignedtopic k in dth clinical day

N vk

The number of clinical activities with value v and topic k

N kd zgd

The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day

N vk zgd

The number of clinical activities with value v andtopic k except gth group in dth clinical day

α β Dirichlet prior vector

ϕ θ Multinomial distribution over clinical activitiesand topics

ϕkMultinomial distribution over clinical activities for

topic k

θd Multinomial distribution over topics for dth clinical day

휃d

k

zdi

adi

i

d

kisin[1 K] [1 Nd][1 D]

isin

isin

d

k

d1

d1

d

k [1 K]

d2

d2

ds

ds

s= d

g g g

g g g

g [1Gd][1D]

isin isin

isin

(a) (b)

Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA

5Journal of Healthcare Engineering

take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg

d is imposed to express thetopic assignment constraint as follows

C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg

d

0 otherwise4

For each group Cgd would equal to 1 only when all the

clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg

d in the joint distribution of LDA (1) as follows

Pprime Z A = 1Δ P Z A prod

dgC zgd 5

where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we

sample a topic for a group based on its posteriorPprime zgd = k Z zgd

A where zgd = k represents the set zgd which

has the same topic k (the notations are explained in Table 2)

Pprime zgd = k Z zgd A = P Z A

P Z zgd A

propΓ N k

d + αk

Γ N kd zgd

+ αk

sdotΓ N

αgd

k + βαgd

Γ Nk +V

vβv

Γ Nαgd

k zgd+ βα

gd

Γ Nk zgd+V

vβv

=Γ N k

d zgd+ Ag

d + αk

Γ N kd zgd

+ αk

sdotΓ N

agdk zgd

+ Agd + βα

gd

Γ Nagd

k zgd+ βα

gd

sdotΓ Nk zgd

+V

vβv

Γ Nk zgd+ Ag

d +V

vβv

= prodAgd

j=1N k

k zgd+ αk + jminus 1

sdotN

αgd

k zgd+ βα

gd+ jminus 1

Nk zgd+V

vβv + jminus 1

6

34 Topic Correlation Limitation The ranking of a clinical

activity v in a topic k is determined by ϕ vk From (3) it is

observed thatN vk is critical to the calculation of ϕ v

k In addi-

tion the sum of N vk among all topics is a fixed value which

equals to the frequency of clinical activities v (denoted as

N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals

We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects

(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics

(2) Associate the calculation of ϕ vk with the informative-

ness of v

We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking

IDF v = log Dd isin 1D v isinDd

7

Accordingly the new calculation of ϕ vk is denoted

as follows

ϕvk = ϕ v

k sdot IDF v 8

The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1

The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in

RV k v = N vk

rank k v 9

where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator

N vk reflects the relevant degree of topic k to clinical activity

6 Journal of Healthcare Engineering

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows

P Z A α β = P Z α sdotP A Z β

= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ

prop prodK

k=1prodD

d=1Γ N k

d + αk sdotprodV

v=1Γ N vk + βv

Γ Nk +V

v=1βv

1where Γ middot is the gamma function

Given the joint distribution the topic assignment foreach clinical activity can be written as follows

P zdi = k Z zdi A = P Z AP Z zdi A

propN v

k zdi + βv

V

v=1Nvk zdi + βv

sdot N kd zdi + αk

2

After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows

ϕ vk = N v

k + βv

V

v=1Nvk + βv

θ kd = N k

d + αk

K

k=1Nkd + αk

3

where ϕ vk is the probability of clinical activity v which

belongs to topic k and θ kd is the probability of the dth clinical

day which contains topic k

33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice

In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to

Table 2 Meanings of the notations

Notation Meaning

D The set of clinical days

A The set of clinical activities

Z The set of topics assigned to A

D K The number of clinical days and topics

V The number of unique clinical activities

Nd The number of clinical activities in dth clinical day

NkThe number of clinical activities that are assigned

topic k

GdThe number of groups (unique clinical activities)

in dth clinical day

Agd

The number of clinical activities in gth group indth clinical day

adi The ith clinical activities in dth clinical day

zdi The topic for adiagd The clinical activities of gth group in dth clinical day

agdj The jth clinical activity in gth group in dth clinical day

zgdj The topic for agdj

zgdzgdj

Agd

j=1 the set of topics for gth group in dth

clinical day

N kd

The number of clinical activities that are assignedtopic k in dth clinical day

N vk

The number of clinical activities with value v and topic k

N kd zgd

The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day

N vk zgd

The number of clinical activities with value v andtopic k except gth group in dth clinical day

α β Dirichlet prior vector

ϕ θ Multinomial distribution over clinical activitiesand topics

ϕkMultinomial distribution over clinical activities for

topic k

θd Multinomial distribution over topics for dth clinical day

휃d

k

zdi

adi

i

d

kisin[1 K] [1 Nd][1 D]

isin

isin

d

k

d1

d1

d

k [1 K]

d2

d2

ds

ds

s= d

g g g

g g g

g [1Gd][1D]

isin isin

isin

(a) (b)

Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA

5Journal of Healthcare Engineering

take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg

d is imposed to express thetopic assignment constraint as follows

C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg

d

0 otherwise4

For each group Cgd would equal to 1 only when all the

clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg

d in the joint distribution of LDA (1) as follows

Pprime Z A = 1Δ P Z A prod

dgC zgd 5

where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we

sample a topic for a group based on its posteriorPprime zgd = k Z zgd

A where zgd = k represents the set zgd which

has the same topic k (the notations are explained in Table 2)

Pprime zgd = k Z zgd A = P Z A

P Z zgd A

propΓ N k

d + αk

Γ N kd zgd

+ αk

sdotΓ N

αgd

k + βαgd

Γ Nk +V

vβv

Γ Nαgd

k zgd+ βα

gd

Γ Nk zgd+V

vβv

=Γ N k

d zgd+ Ag

d + αk

Γ N kd zgd

+ αk

sdotΓ N

agdk zgd

+ Agd + βα

gd

Γ Nagd

k zgd+ βα

gd

sdotΓ Nk zgd

+V

vβv

Γ Nk zgd+ Ag

d +V

vβv

= prodAgd

j=1N k

k zgd+ αk + jminus 1

sdotN

αgd

k zgd+ βα

gd+ jminus 1

Nk zgd+V

vβv + jminus 1

6

34 Topic Correlation Limitation The ranking of a clinical

activity v in a topic k is determined by ϕ vk From (3) it is

observed thatN vk is critical to the calculation of ϕ v

k In addi-

tion the sum of N vk among all topics is a fixed value which

equals to the frequency of clinical activities v (denoted as

N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals

We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects

(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics

(2) Associate the calculation of ϕ vk with the informative-

ness of v

We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking

IDF v = log Dd isin 1D v isinDd

7

Accordingly the new calculation of ϕ vk is denoted

as follows

ϕvk = ϕ v

k sdot IDF v 8

The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1

The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in

RV k v = N vk

rank k v 9

where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator

N vk reflects the relevant degree of topic k to clinical activity

6 Journal of Healthcare Engineering

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg

d is imposed to express thetopic assignment constraint as follows

C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg

d

0 otherwise4

For each group Cgd would equal to 1 only when all the

clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg

d in the joint distribution of LDA (1) as follows

Pprime Z A = 1Δ P Z A prod

dgC zgd 5

where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we

sample a topic for a group based on its posteriorPprime zgd = k Z zgd

A where zgd = k represents the set zgd which

has the same topic k (the notations are explained in Table 2)

Pprime zgd = k Z zgd A = P Z A

P Z zgd A

propΓ N k

d + αk

Γ N kd zgd

+ αk

sdotΓ N

αgd

k + βαgd

Γ Nk +V

vβv

Γ Nαgd

k zgd+ βα

gd

Γ Nk zgd+V

vβv

=Γ N k

d zgd+ Ag

d + αk

Γ N kd zgd

+ αk

sdotΓ N

agdk zgd

+ Agd + βα

gd

Γ Nagd

k zgd+ βα

gd

sdotΓ Nk zgd

+V

vβv

Γ Nk zgd+ Ag

d +V

vβv

= prodAgd

j=1N k

k zgd+ αk + jminus 1

sdotN

αgd

k zgd+ βα

gd+ jminus 1

Nk zgd+V

vβv + jminus 1

6

34 Topic Correlation Limitation The ranking of a clinical

activity v in a topic k is determined by ϕ vk From (3) it is

observed thatN vk is critical to the calculation of ϕ v

k In addi-

tion the sum of N vk among all topics is a fixed value which

equals to the frequency of clinical activities v (denoted as

N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals

We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects

(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics

(2) Associate the calculation of ϕ vk with the informative-

ness of v

We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking

IDF v = log Dd isin 1D v isinDd

7

Accordingly the new calculation of ϕ vk is denoted

as follows

ϕvk = ϕ v

k sdot IDF v 8

The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1

The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in

RV k v = N vk

rank k v 9

where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator

N vk reflects the relevant degree of topic k to clinical activity

6 Journal of Healthcare Engineering

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic

By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV

The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD

d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups

35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model

4 Experiments

In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings

41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01

The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300

42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8

421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations

Table 3 Statistics of our datasets

DiseaseTracenumber

Daynumber

Vocnumber

AvgLOS

MinLOS

MaxLOS

ICH 240 3204 752 14 2 34

IH 33 241 447 6 2 10

Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list

Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K

for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for

clinical activity with value v4 Sample zgd for each group and Initialize all the count

parameters NkNkd and N v

k accordingly

5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag

d Nkd minus = Ag

d Nvk minus = Ag

d 15 Sample topic index kprime from κ v

according to (6)

16 Nkprime + = Agd N

kprimed + = Ag

d Nv

kprime + = Agd

17 Calculate and return ϕ (based on 8) and β

Algorithm 1 Gibbs sampling of CDG-LDA

7Journal of Healthcare Engineering

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

Table 4 Eight labeled topics of ICH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1 Imaging

(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color

Doppler ultrasonography of abdomen(7) X-radiography (8) digitized

photography (9) B mode ultrasoundand (10) film

Imaging

(1) Doppler echocardiography(2) Doppler ultrasonography of

left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color

Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film

2 Medication

(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion

Medication

(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C

(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose

3High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory

blood pressure monitoring and(10) continuous oxygen inhalation

High-levelnursing care

(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package

(9) ambulatory blood pressure monitoringand (10) drainage pack

4Regular nursing

care andmedication

(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6

(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion

(9) sodium chloride and (10) glycerinfructose

Regularnursing care

(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous

cannula (6) therapy application(7) aceglutamide (8) intramuscular

injection (9) physical cooling venipunctureand (10) nifedipine

5Admissionexamination

(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix

(6) washbasin (7) ECG event log(8) plasma prothrombin time assay

(9) activated partial thromboplastin timeand (10) prothrombin time assay

Admissionexamination

(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living

6Biochemical

exam

(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood

cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum

bicarbonate assay and (10) excrement

Biochemicalexam

(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell

analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay

and (10) calcium assay

7Biochemical

exam

(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification

of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme

activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay

Biochemicalexam

(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity

assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated

hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay

8Biochemical

exam

(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine

analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis

(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay

Biochemicalexam

(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment

quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase

isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum

alkaline phosphatase assay

8 Journal of Healthcare Engineering

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice

We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree

NKQM N= 1KK

k=1

N

j=1 score Mkj log j + 1ZN

10

where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM

422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic

Table 5 Five labeled topics of IH by different methods

Number Topic tag LDA Topic tag CDG-LDA

1Biochemical

exam

(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay

(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay

Biochemicalexam

(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay

(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay

(9) calcium assay and (10) magnesium assay

2Infectious

disease exam

(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay

(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy

Presurgicalpreparation

(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP

monitoring (9) cefuroxime and(10) infusion apparatus

3Admission

exam

(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines

(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin

(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen

Admissionexam

(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy

(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and

(10) analysis of ultrasonic

4Regular

nursing care

(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix

(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula

Regularnursing care

(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam

(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change

(9) levofloxacin and (10) middle dressing change

5Surgery

and regularnursing care

(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital

examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride

Surgery

(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application

(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring

Table 6 Topic coherence in terms of NKQMN

Dataset Method NKQM5 NKQM10 NKQM20

ICHLDA 08211 08004 07907

CDG-LDA 08448 08498 08235

IHLDA 07915 07830 07631

CDG-LDA 08316 08266 08116

9Journal of Healthcare Engineering

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics

RE =visinΩ count v minus 1 11

whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the

(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result

Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics

423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects

(i) Examination

For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5

For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part

(ii) Treatment

Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance

3 7 11 15 19 3 7 11 15 19

0

20

40

60

80

100

ICH

0

50

100

150

200IH

LDA (N = 10) N = 10)LDA (N = 20) N = 20)

CDG-LDA (CDG-LDA (

Figure 4 Redundancy indicator RE on various topic number K and top size N

10 Journal of Healthcare Engineering

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset

IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4

(iii) Nursing care

In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment

(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals

43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we

got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways

(i) ICH

The process can be easily divided into three stages

(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations

(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms

(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states

5 6 7 8 3

2 3

2

4

1 2 4 5

Figure 5 ICH process model

3 1

2

5 2

4

Figure 6 IH process model

11Journal of Healthcare Engineering

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

(ii) IH

Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery

(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP

5 Conclusion

In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation

One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy

Conflicts of Interest

The authors declare that there is no conflict of interestregarding the publication of this paper

Acknowledgments

This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC

References

[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003

[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013

[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014

[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014

[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016

[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016

[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001

[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005

[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009

[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003

[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008

[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009

[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012

[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015

[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013

[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013

[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with

12 Journal of Healthcare Engineering

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015

[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012

[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015

[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014

[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016

[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009

[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014

13Journal of Healthcare Engineering

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Incorporating Topic Assignment Constraint and Topic Correlation …downloads.hindawi.com/journals/jhe/2017/5208072.pdf · 2019-07-30 · Goal LDA (CDG-LDA). As shown in Figure 2,

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of