Techniques for Dimensionality Reduction Topic Models/LDA Latent Dirichlet Allocation.
Incorporating Topic Assignment Constraint and Topic Correlation...
Transcript of Incorporating Topic Assignment Constraint and Topic Correlation...
Research ArticleIncorporating Topic Assignment Constraint and TopicCorrelation Limitation into Clinical Goal Discovering for ClinicalPathway Mining
Xiao Xu Tao Jin Zhijie Wei and Jianmin Wang
School of Software Tsinghua University Beijing China
Correspondence should be addressed to Tao Jin jintao05gmailcom
Received 4 November 2016 Revised 24 January 2017 Accepted 5 February 2017 Published 22 May 2017
Academic Editor Evangelos Sakkopoulos
Copyright copy 2017 Xiao Xu et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Clinical pathways are widely used around the world for providing quality medical treatment and controlling healthcare costHowever the expert-designed clinical pathways can hardly deal with the variances among hospitals and patients It calls formore dynamic and adaptive process which is derived from various clinical data Topic-based clinical pathway mining is aneffective approach to discover a concise process model Through this approach the latent topics found by latent Dirichletallocation (LDA) represent the clinical goals And process mining methods are used to extract the temporal relations betweenthese topics However the topic quality is usually not desirable due to the low performance of the LDA in clinical data In thispaper we incorporate topic assignment constraint and topic correlation limitation into the LDA to enhance the ability ofdiscovering high-quality topics Two real-world datasets are used to evaluate the proposed method The results show that thetopics discovered by our method are with higher coherence informativeness and coverage than the original LDA These qualitytopics are suitable to represent the clinical goals Also we illustrate that our method is effective in generating a comprehensivetopic-based clinical pathway model
1 Introduction
Clinical pathway (CP) is of great importance in diseasetreatment hospital management and government regula-tion It has been widely used to make the treatment processmore standardized and normalized The core principle ofCP is to divide the clinical treatment into several stages anddetail the clinical activities in each stage Now there are morethan 300 CPs which have been designed by experts andimplemented in China Table 1 is a Chinese CP exampleabout intracerebral hemorrhage (ICH) However these CPscan be hardly put into practical application due to their staticand nonadaptive feature
Recently clinical pathwaymining (CPM) has experiencedincreased attention which refers to using data mining tech-nologies to discover the execution CP from hospital historicaldata In comparison with the expert-designed CP the data-driven execution CP is more objective It can facilitate theCP (re)design CP implementation and abnormal detection
The most current CPM approaches are based on theprocess mining technologies which are popular in extractingprocess models from event logs The resultant graph modelscontain two basic elements nodes represent clinical activitiesand edges represent the order relations between these activi-ties However the clinical data are too unstructured andcomplex for process mining methods They usually result inspaghetti-like models which are comprised by massive nodesand edges To make the process model more comprehensiveresearchers attempt to manually predefine the key activitiesfor process mining It is effective in promoting the modelconciseness and highlighting core information while themanual works are always expensive and subjective
Some other approaches focus on using topic modelingmethods to discover clinical patterns instead of processmodels They assume that the treatment for a disease canbe concluded into a number of patterns For a patient adoctor would choose one pattern for himher accordingto the symptoms To achieve this target topic modeling
HindawiJournal of Healthcare EngineeringVolume 2017 Article ID 5208072 13 pageshttpsdoiorg10115520175208072
technologies such as latent Dirichlet allocation (LDA) [1]and its variants are applied on the clinical data [2ndash5] Theymodel each patient trace as mixtures of multiple topics whileeach topic is modeled as a multinomial distribution overclinical activities However these approaches can hardlyreflect the temporal structure between clinical activitieswhich are important for CP
In our previous work [6] we take the advantages ofboth topic modeling and process mining to generate a
concise and interpretable topic-based process model Thekey principle is to use LDA to discover the latent topicsfrom the clinical data and extract the order relationsbetween these topics by process mining methods It is suit-able to regard the latent topics as the clinical goals based onthe following two clinical practices
(1) The clinical activities occurring in a specific timeduration (usually a hospitalized day) are prescribedaround several clinical goals
(2) Each clinical goal corresponds to a clinicalactivity set
Take the first day of ICH as the example (as shown inTable 1) For a new patient the doctors would make adiagnosis for himher by evaluating the intracranial pressure(ICP) and try to control the ICP into the normal range Theseconstitute the clinical goals for the first day To achieve thesegoals doctors have a series of corresponded choices BrainCT and the related medical imaging are suitable for theformer goal Dehydration drugs like mannitol and glycerinfructose are very useful for the latter goal Thus each clinicalday would be represented as several topics instead of thedetailed clinical activities Moreover the process miningmethod is effective in discovering the temporal structure inthese high-level sequences
However the raw LDA can hardly ensure the quality ofthe discovered topics which is critical to the interpretabilityof the final process model By carefully analyzing the clinicalscenario we find two common problems in applying rawLDA on clinical data
(1) The same clinical activities occurring in one clinicalday may be assigned different topics
(2) A clinical activity may have strong correlation tomany topics
We demonstrate two examples to explain them (1) InFigure 1(a) the four penicillin prescribed by the doctor inone clinical day are assigned to four different topics whilein the clinical scenario they are both used for the sameclinical goal which means that they should share the sametopic (2) In Figure 1(b) the clinical activity syringe rankshigh in all the three discovered topics However each clinicalactivity should have relative limited number clinical goalsrather than various clinical goals
To address the above problems we proposed a novelextension of LDA for CPM which is called Clinical DailyGoal LDA (CDG-LDA) As shown in Figure 2 we firstintroduce a constraint into LDA which can ensure thesame clinical activity in one clinical day would be assignedto the same topic (topic assignment constraint) It canimprove the consistency of the topic assignment procedurefor clinical goal discovering Second we present an effi-cient method to adaptively limit the topic number of eachclinical activity and adjust its ranking in the topicaccording to their informativeness (topic correlation lim-itation) It guarantees that a clinical activity would onlyrank high in limited clinical topics Third an effective
Table 1 The national clinical pathway of intracerebral hemorrhagereleased by the Ministry of Health of China
Stage Order
Stage 1(Day 1)
Long-term medical order
(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed and (5) observingvital signs
Temporary medical order
(1) blood urine and stool routine examination(2) hepatorenal function electrolytes blood glucoseblood lipids cardiac enzymes coagulation functionand infectious disease screening (3) brain CT chestX-ray and ECG and (4) when necessary brain MRICTA MRA or DSA
Stage 2(Day 2)
Long-term medical order
(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory valuesand (2) when necessary re-examination CT
Stage 3(Day 3)
Long-term medical order
(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory values
Stage 4(Days 4ndash6)
Long-term medical order
(1) neurology nursing routine (2) level II care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory values(2) re-examination for blood kidney functionblood glucose and electrolytes and (3) whennecessary re-examination CT
Stage 5(Days 7ndash13)
Long-term medical order
(1) neurology nursing routine (2) level II-III care(3) normal diet (4) observing vital signs and(5) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory valuesand (2) when necessary DSA CTA and MRA
Stage 6(Days 8ndash14)
Long-term medical order
(1) discharge with drugs
2 Journal of Healthcare Engineering
process mining framework is applied on the topic-basedsequences to get a comprehensive process model It isworth mentioning that we use the billing data as ourexperimental data Compared to other various kinds ofclinical data billing data is the most common and cred-ible one due to its importance in insurance area And itcan be easily extracted from hospital information systemwithout complex data integration technologies Billingdata is also of the lowest data granularity and containsa lot of noise which brings a great difficulty for CPMtask Thus the CPM approach on the billing data isboth valuable and challenged
Our main contributions in this paper are as follows
(i) We incorporate topic assignment constraint intoLDA to make the inference procedure consis-tent for the same clinical activities in oneclinical day
(ii) Informativeness is introduced to adjust theranking in each topic which can improve thetopic quality
(iii) Experiments on real-world datasets demonstrate thegreatperformanceofourapproach forCPMbyapply-ing process mining on the quality discovered topics
We first give the definitions used in this paper
Definition 1 (clinical activity) A clinical activity is an eventoccurring at a particular time point which refers to the basicelement in treatment
Definition 2 (clinical day and patient trace) A patient tracecontains a series of clinical activities We segment eachpatient trace into a collection of clinical days and a clinicalday represents a set of clinical activities occurring in the samehospitalized day
2 Related Work
Early in 2001 Lin et al [7] pointed out the shortage of theexpert-designed CPs in handling the great variances causedby individual differences So that a more dynamic and
Penicillin penicillin penicillin penicillin and others
Antibacterial drugImaging
Nursing care
Blood exam
Topic imaging
(1) CT(2) Syringe(3) MRI
Topic nursing
(1) Syringe(2) Mouth care(3) Skin care
Topic drug
(1) Syringe(2) Penicillin(3) Vitamin
⦙ ⦙ ⦙
(a) (b)
Figure 1 Two problems in applying LDA on clinical data (a) the clinical activities in one day for a patient and (b) the three discovered topics
Visit ID Item name Amount Occurrence time
T1T1T1T1T1
T2T2
ABACD
BC
32125
42
20146122014612201461220146122014613
20151132015113
⦙ ⦙ ⦙ ⦙
⦙ ⦙ ⦙ ⦙
⦙ ⦙
⦙ ⦙⦙ ⦙
⦙ ⦙
⦙
⦙
A A A B B A C C
D D DD D
B B BB CC
Day 1
Day 2
Day 1
G2G1 G3T1 Topic 1 A 06 B 03 and others
Topic 2 C 07 D 01 and others
Topic 3 B 04 C 03 and others
Goal 1 imaging CT MRI and others
Goal 2 blood examCreatinine sodium and others
Goal 3 medication Penicillin vitamin and others
Data preparation Topic modeling Process mining
T1
Goal 1 Goal 2
Day 1 Day 2hellip
T2
Goals 3 and 1 Goal 1
Day 1 Day 2hellip
T1
Topic 1 Topic 2
Day 1 Day 2hellip
T2
Topics 3 and 1 Topic 1
Day 1 Day 2hellip
Busin
ess d
omai
nA
lgor
ithm
dom
ain
Pipeline
a1 a2 a3 a4 a5
T2
VOC
ABCD
Goals 3 and 1
Goal 1
Goal 2
Topic assignment constraint Topic correlation limitation
Topics 3 and 1
Topic 1
Topic 2
Figure 2 The illustrative process of our approach The top part is the pipeline middle part is the business domain and bottom part is thecorresponding algorithm domain (T patient trace a clinical activity day clinical day G group VOC vocabulary)
3Journal of Healthcare Engineering
adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario
Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM
Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA
In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating
the topic assignment constraint and topic correlation limita-tion into LDA
3 Methodology
We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data
31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words
32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows
(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD
(a) Draw distribution θdsimDir α
(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd
(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi
According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular
4 Journal of Healthcare Engineering
inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows
P Z A α β = P Z α sdotP A Z β
= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ
prop prodK
k=1prodD
d=1Γ N k
d + αk sdotprodV
v=1Γ N vk + βv
Γ Nk +V
v=1βv
1where Γ middot is the gamma function
Given the joint distribution the topic assignment foreach clinical activity can be written as follows
P zdi = k Z zdi A = P Z AP Z zdi A
propN v
k zdi + βv
V
v=1Nvk zdi + βv
sdot N kd zdi + αk
2
After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows
ϕ vk = N v
k + βv
V
v=1Nvk + βv
θ kd = N k
d + αk
K
k=1Nkd + αk
3
where ϕ vk is the probability of clinical activity v which
belongs to topic k and θ kd is the probability of the dth clinical
day which contains topic k
33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice
In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to
Table 2 Meanings of the notations
Notation Meaning
D The set of clinical days
A The set of clinical activities
Z The set of topics assigned to A
D K The number of clinical days and topics
V The number of unique clinical activities
Nd The number of clinical activities in dth clinical day
NkThe number of clinical activities that are assigned
topic k
GdThe number of groups (unique clinical activities)
in dth clinical day
Agd
The number of clinical activities in gth group indth clinical day
adi The ith clinical activities in dth clinical day
zdi The topic for adiagd The clinical activities of gth group in dth clinical day
agdj The jth clinical activity in gth group in dth clinical day
zgdj The topic for agdj
zgdzgdj
Agd
j=1 the set of topics for gth group in dth
clinical day
N kd
The number of clinical activities that are assignedtopic k in dth clinical day
N vk
The number of clinical activities with value v and topic k
N kd zgd
The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day
N vk zgd
The number of clinical activities with value v andtopic k except gth group in dth clinical day
α β Dirichlet prior vector
ϕ θ Multinomial distribution over clinical activitiesand topics
ϕkMultinomial distribution over clinical activities for
topic k
θd Multinomial distribution over topics for dth clinical day
훼
훽
휃d
k
zdi
adi
i
d
kisin[1 K] [1 Nd][1 D]
isin
isin
d
k
d1
d1
d
k [1 K]
d2
d2
ds
ds
s= d
g g g
g g g
g [1Gd][1D]
isin isin
isin
(a) (b)
Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA
5Journal of Healthcare Engineering
take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg
d is imposed to express thetopic assignment constraint as follows
C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg
d
0 otherwise4
For each group Cgd would equal to 1 only when all the
clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg
d in the joint distribution of LDA (1) as follows
Pprime Z A = 1Δ P Z A prod
dgC zgd 5
where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we
sample a topic for a group based on its posteriorPprime zgd = k Z zgd
A where zgd = k represents the set zgd which
has the same topic k (the notations are explained in Table 2)
Pprime zgd = k Z zgd A = P Z A
P Z zgd A
propΓ N k
d + αk
Γ N kd zgd
+ αk
sdotΓ N
αgd
k + βαgd
Γ Nk +V
vβv
Γ Nαgd
k zgd+ βα
gd
Γ Nk zgd+V
vβv
=Γ N k
d zgd+ Ag
d + αk
Γ N kd zgd
+ αk
sdotΓ N
agdk zgd
+ Agd + βα
gd
Γ Nagd
k zgd+ βα
gd
sdotΓ Nk zgd
+V
vβv
Γ Nk zgd+ Ag
d +V
vβv
= prodAgd
j=1N k
k zgd+ αk + jminus 1
sdotN
αgd
k zgd+ βα
gd+ jminus 1
Nk zgd+V
vβv + jminus 1
6
34 Topic Correlation Limitation The ranking of a clinical
activity v in a topic k is determined by ϕ vk From (3) it is
observed thatN vk is critical to the calculation of ϕ v
k In addi-
tion the sum of N vk among all topics is a fixed value which
equals to the frequency of clinical activities v (denoted as
N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals
We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects
(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics
(2) Associate the calculation of ϕ vk with the informative-
ness of v
We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking
IDF v = log Dd isin 1D v isinDd
7
Accordingly the new calculation of ϕ vk is denoted
as follows
ϕvk = ϕ v
k sdot IDF v 8
The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1
The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in
RV k v = N vk
rank k v 9
where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator
N vk reflects the relevant degree of topic k to clinical activity
6 Journal of Healthcare Engineering
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
technologies such as latent Dirichlet allocation (LDA) [1]and its variants are applied on the clinical data [2ndash5] Theymodel each patient trace as mixtures of multiple topics whileeach topic is modeled as a multinomial distribution overclinical activities However these approaches can hardlyreflect the temporal structure between clinical activitieswhich are important for CP
In our previous work [6] we take the advantages ofboth topic modeling and process mining to generate a
concise and interpretable topic-based process model Thekey principle is to use LDA to discover the latent topicsfrom the clinical data and extract the order relationsbetween these topics by process mining methods It is suit-able to regard the latent topics as the clinical goals based onthe following two clinical practices
(1) The clinical activities occurring in a specific timeduration (usually a hospitalized day) are prescribedaround several clinical goals
(2) Each clinical goal corresponds to a clinicalactivity set
Take the first day of ICH as the example (as shown inTable 1) For a new patient the doctors would make adiagnosis for himher by evaluating the intracranial pressure(ICP) and try to control the ICP into the normal range Theseconstitute the clinical goals for the first day To achieve thesegoals doctors have a series of corresponded choices BrainCT and the related medical imaging are suitable for theformer goal Dehydration drugs like mannitol and glycerinfructose are very useful for the latter goal Thus each clinicalday would be represented as several topics instead of thedetailed clinical activities Moreover the process miningmethod is effective in discovering the temporal structure inthese high-level sequences
However the raw LDA can hardly ensure the quality ofthe discovered topics which is critical to the interpretabilityof the final process model By carefully analyzing the clinicalscenario we find two common problems in applying rawLDA on clinical data
(1) The same clinical activities occurring in one clinicalday may be assigned different topics
(2) A clinical activity may have strong correlation tomany topics
We demonstrate two examples to explain them (1) InFigure 1(a) the four penicillin prescribed by the doctor inone clinical day are assigned to four different topics whilein the clinical scenario they are both used for the sameclinical goal which means that they should share the sametopic (2) In Figure 1(b) the clinical activity syringe rankshigh in all the three discovered topics However each clinicalactivity should have relative limited number clinical goalsrather than various clinical goals
To address the above problems we proposed a novelextension of LDA for CPM which is called Clinical DailyGoal LDA (CDG-LDA) As shown in Figure 2 we firstintroduce a constraint into LDA which can ensure thesame clinical activity in one clinical day would be assignedto the same topic (topic assignment constraint) It canimprove the consistency of the topic assignment procedurefor clinical goal discovering Second we present an effi-cient method to adaptively limit the topic number of eachclinical activity and adjust its ranking in the topicaccording to their informativeness (topic correlation lim-itation) It guarantees that a clinical activity would onlyrank high in limited clinical topics Third an effective
Table 1 The national clinical pathway of intracerebral hemorrhagereleased by the Ministry of Health of China
Stage Order
Stage 1(Day 1)
Long-term medical order
(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed and (5) observingvital signs
Temporary medical order
(1) blood urine and stool routine examination(2) hepatorenal function electrolytes blood glucoseblood lipids cardiac enzymes coagulation functionand infectious disease screening (3) brain CT chestX-ray and ECG and (4) when necessary brain MRICTA MRA or DSA
Stage 2(Day 2)
Long-term medical order
(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory valuesand (2) when necessary re-examination CT
Stage 3(Day 3)
Long-term medical order
(1) neurology nursing routine (2) level I care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory values
Stage 4(Days 4ndash6)
Long-term medical order
(1) neurology nursing routine (2) level II care(3) normal diet (4) keep the bed (5) observing vitalsigns and (6) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory values(2) re-examination for blood kidney functionblood glucose and electrolytes and (3) whennecessary re-examination CT
Stage 5(Days 7ndash13)
Long-term medical order
(1) neurology nursing routine (2) level II-III care(3) normal diet (4) observing vital signs and(5) basic drugs
Temporary medical order
(1) re-examination for abnormal laboratory valuesand (2) when necessary DSA CTA and MRA
Stage 6(Days 8ndash14)
Long-term medical order
(1) discharge with drugs
2 Journal of Healthcare Engineering
process mining framework is applied on the topic-basedsequences to get a comprehensive process model It isworth mentioning that we use the billing data as ourexperimental data Compared to other various kinds ofclinical data billing data is the most common and cred-ible one due to its importance in insurance area And itcan be easily extracted from hospital information systemwithout complex data integration technologies Billingdata is also of the lowest data granularity and containsa lot of noise which brings a great difficulty for CPMtask Thus the CPM approach on the billing data isboth valuable and challenged
Our main contributions in this paper are as follows
(i) We incorporate topic assignment constraint intoLDA to make the inference procedure consis-tent for the same clinical activities in oneclinical day
(ii) Informativeness is introduced to adjust theranking in each topic which can improve thetopic quality
(iii) Experiments on real-world datasets demonstrate thegreatperformanceofourapproach forCPMbyapply-ing process mining on the quality discovered topics
We first give the definitions used in this paper
Definition 1 (clinical activity) A clinical activity is an eventoccurring at a particular time point which refers to the basicelement in treatment
Definition 2 (clinical day and patient trace) A patient tracecontains a series of clinical activities We segment eachpatient trace into a collection of clinical days and a clinicalday represents a set of clinical activities occurring in the samehospitalized day
2 Related Work
Early in 2001 Lin et al [7] pointed out the shortage of theexpert-designed CPs in handling the great variances causedby individual differences So that a more dynamic and
Penicillin penicillin penicillin penicillin and others
Antibacterial drugImaging
Nursing care
Blood exam
Topic imaging
(1) CT(2) Syringe(3) MRI
Topic nursing
(1) Syringe(2) Mouth care(3) Skin care
Topic drug
(1) Syringe(2) Penicillin(3) Vitamin
⦙ ⦙ ⦙
(a) (b)
Figure 1 Two problems in applying LDA on clinical data (a) the clinical activities in one day for a patient and (b) the three discovered topics
Visit ID Item name Amount Occurrence time
T1T1T1T1T1
T2T2
ABACD
BC
32125
42
20146122014612201461220146122014613
20151132015113
⦙ ⦙ ⦙ ⦙
⦙ ⦙ ⦙ ⦙
⦙ ⦙
⦙ ⦙⦙ ⦙
⦙ ⦙
⦙
⦙
A A A B B A C C
D D DD D
B B BB CC
Day 1
Day 2
Day 1
G2G1 G3T1 Topic 1 A 06 B 03 and others
Topic 2 C 07 D 01 and others
Topic 3 B 04 C 03 and others
Goal 1 imaging CT MRI and others
Goal 2 blood examCreatinine sodium and others
Goal 3 medication Penicillin vitamin and others
Data preparation Topic modeling Process mining
T1
Goal 1 Goal 2
Day 1 Day 2hellip
T2
Goals 3 and 1 Goal 1
Day 1 Day 2hellip
T1
Topic 1 Topic 2
Day 1 Day 2hellip
T2
Topics 3 and 1 Topic 1
Day 1 Day 2hellip
Busin
ess d
omai
nA
lgor
ithm
dom
ain
Pipeline
a1 a2 a3 a4 a5
T2
VOC
ABCD
Goals 3 and 1
Goal 1
Goal 2
Topic assignment constraint Topic correlation limitation
Topics 3 and 1
Topic 1
Topic 2
Figure 2 The illustrative process of our approach The top part is the pipeline middle part is the business domain and bottom part is thecorresponding algorithm domain (T patient trace a clinical activity day clinical day G group VOC vocabulary)
3Journal of Healthcare Engineering
adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario
Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM
Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA
In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating
the topic assignment constraint and topic correlation limita-tion into LDA
3 Methodology
We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data
31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words
32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows
(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD
(a) Draw distribution θdsimDir α
(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd
(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi
According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular
4 Journal of Healthcare Engineering
inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows
P Z A α β = P Z α sdotP A Z β
= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ
prop prodK
k=1prodD
d=1Γ N k
d + αk sdotprodV
v=1Γ N vk + βv
Γ Nk +V
v=1βv
1where Γ middot is the gamma function
Given the joint distribution the topic assignment foreach clinical activity can be written as follows
P zdi = k Z zdi A = P Z AP Z zdi A
propN v
k zdi + βv
V
v=1Nvk zdi + βv
sdot N kd zdi + αk
2
After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows
ϕ vk = N v
k + βv
V
v=1Nvk + βv
θ kd = N k
d + αk
K
k=1Nkd + αk
3
where ϕ vk is the probability of clinical activity v which
belongs to topic k and θ kd is the probability of the dth clinical
day which contains topic k
33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice
In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to
Table 2 Meanings of the notations
Notation Meaning
D The set of clinical days
A The set of clinical activities
Z The set of topics assigned to A
D K The number of clinical days and topics
V The number of unique clinical activities
Nd The number of clinical activities in dth clinical day
NkThe number of clinical activities that are assigned
topic k
GdThe number of groups (unique clinical activities)
in dth clinical day
Agd
The number of clinical activities in gth group indth clinical day
adi The ith clinical activities in dth clinical day
zdi The topic for adiagd The clinical activities of gth group in dth clinical day
agdj The jth clinical activity in gth group in dth clinical day
zgdj The topic for agdj
zgdzgdj
Agd
j=1 the set of topics for gth group in dth
clinical day
N kd
The number of clinical activities that are assignedtopic k in dth clinical day
N vk
The number of clinical activities with value v and topic k
N kd zgd
The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day
N vk zgd
The number of clinical activities with value v andtopic k except gth group in dth clinical day
α β Dirichlet prior vector
ϕ θ Multinomial distribution over clinical activitiesand topics
ϕkMultinomial distribution over clinical activities for
topic k
θd Multinomial distribution over topics for dth clinical day
훼
훽
휃d
k
zdi
adi
i
d
kisin[1 K] [1 Nd][1 D]
isin
isin
d
k
d1
d1
d
k [1 K]
d2
d2
ds
ds
s= d
g g g
g g g
g [1Gd][1D]
isin isin
isin
(a) (b)
Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA
5Journal of Healthcare Engineering
take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg
d is imposed to express thetopic assignment constraint as follows
C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg
d
0 otherwise4
For each group Cgd would equal to 1 only when all the
clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg
d in the joint distribution of LDA (1) as follows
Pprime Z A = 1Δ P Z A prod
dgC zgd 5
where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we
sample a topic for a group based on its posteriorPprime zgd = k Z zgd
A where zgd = k represents the set zgd which
has the same topic k (the notations are explained in Table 2)
Pprime zgd = k Z zgd A = P Z A
P Z zgd A
propΓ N k
d + αk
Γ N kd zgd
+ αk
sdotΓ N
αgd
k + βαgd
Γ Nk +V
vβv
Γ Nαgd
k zgd+ βα
gd
Γ Nk zgd+V
vβv
=Γ N k
d zgd+ Ag
d + αk
Γ N kd zgd
+ αk
sdotΓ N
agdk zgd
+ Agd + βα
gd
Γ Nagd
k zgd+ βα
gd
sdotΓ Nk zgd
+V
vβv
Γ Nk zgd+ Ag
d +V
vβv
= prodAgd
j=1N k
k zgd+ αk + jminus 1
sdotN
αgd
k zgd+ βα
gd+ jminus 1
Nk zgd+V
vβv + jminus 1
6
34 Topic Correlation Limitation The ranking of a clinical
activity v in a topic k is determined by ϕ vk From (3) it is
observed thatN vk is critical to the calculation of ϕ v
k In addi-
tion the sum of N vk among all topics is a fixed value which
equals to the frequency of clinical activities v (denoted as
N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals
We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects
(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics
(2) Associate the calculation of ϕ vk with the informative-
ness of v
We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking
IDF v = log Dd isin 1D v isinDd
7
Accordingly the new calculation of ϕ vk is denoted
as follows
ϕvk = ϕ v
k sdot IDF v 8
The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1
The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in
RV k v = N vk
rank k v 9
where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator
N vk reflects the relevant degree of topic k to clinical activity
6 Journal of Healthcare Engineering
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
process mining framework is applied on the topic-basedsequences to get a comprehensive process model It isworth mentioning that we use the billing data as ourexperimental data Compared to other various kinds ofclinical data billing data is the most common and cred-ible one due to its importance in insurance area And itcan be easily extracted from hospital information systemwithout complex data integration technologies Billingdata is also of the lowest data granularity and containsa lot of noise which brings a great difficulty for CPMtask Thus the CPM approach on the billing data isboth valuable and challenged
Our main contributions in this paper are as follows
(i) We incorporate topic assignment constraint intoLDA to make the inference procedure consis-tent for the same clinical activities in oneclinical day
(ii) Informativeness is introduced to adjust theranking in each topic which can improve thetopic quality
(iii) Experiments on real-world datasets demonstrate thegreatperformanceofourapproach forCPMbyapply-ing process mining on the quality discovered topics
We first give the definitions used in this paper
Definition 1 (clinical activity) A clinical activity is an eventoccurring at a particular time point which refers to the basicelement in treatment
Definition 2 (clinical day and patient trace) A patient tracecontains a series of clinical activities We segment eachpatient trace into a collection of clinical days and a clinicalday represents a set of clinical activities occurring in the samehospitalized day
2 Related Work
Early in 2001 Lin et al [7] pointed out the shortage of theexpert-designed CPs in handling the great variances causedby individual differences So that a more dynamic and
Penicillin penicillin penicillin penicillin and others
Antibacterial drugImaging
Nursing care
Blood exam
Topic imaging
(1) CT(2) Syringe(3) MRI
Topic nursing
(1) Syringe(2) Mouth care(3) Skin care
Topic drug
(1) Syringe(2) Penicillin(3) Vitamin
⦙ ⦙ ⦙
(a) (b)
Figure 1 Two problems in applying LDA on clinical data (a) the clinical activities in one day for a patient and (b) the three discovered topics
Visit ID Item name Amount Occurrence time
T1T1T1T1T1
T2T2
ABACD
BC
32125
42
20146122014612201461220146122014613
20151132015113
⦙ ⦙ ⦙ ⦙
⦙ ⦙ ⦙ ⦙
⦙ ⦙
⦙ ⦙⦙ ⦙
⦙ ⦙
⦙
⦙
A A A B B A C C
D D DD D
B B BB CC
Day 1
Day 2
Day 1
G2G1 G3T1 Topic 1 A 06 B 03 and others
Topic 2 C 07 D 01 and others
Topic 3 B 04 C 03 and others
Goal 1 imaging CT MRI and others
Goal 2 blood examCreatinine sodium and others
Goal 3 medication Penicillin vitamin and others
Data preparation Topic modeling Process mining
T1
Goal 1 Goal 2
Day 1 Day 2hellip
T2
Goals 3 and 1 Goal 1
Day 1 Day 2hellip
T1
Topic 1 Topic 2
Day 1 Day 2hellip
T2
Topics 3 and 1 Topic 1
Day 1 Day 2hellip
Busin
ess d
omai
nA
lgor
ithm
dom
ain
Pipeline
a1 a2 a3 a4 a5
T2
VOC
ABCD
Goals 3 and 1
Goal 1
Goal 2
Topic assignment constraint Topic correlation limitation
Topics 3 and 1
Topic 1
Topic 2
Figure 2 The illustrative process of our approach The top part is the pipeline middle part is the business domain and bottom part is thecorresponding algorithm domain (T patient trace a clinical activity day clinical day G group VOC vocabulary)
3Journal of Healthcare Engineering
adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario
Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM
Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA
In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating
the topic assignment constraint and topic correlation limita-tion into LDA
3 Methodology
We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data
31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words
32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows
(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD
(a) Draw distribution θdsimDir α
(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd
(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi
According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular
4 Journal of Healthcare Engineering
inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows
P Z A α β = P Z α sdotP A Z β
= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ
prop prodK
k=1prodD
d=1Γ N k
d + αk sdotprodV
v=1Γ N vk + βv
Γ Nk +V
v=1βv
1where Γ middot is the gamma function
Given the joint distribution the topic assignment foreach clinical activity can be written as follows
P zdi = k Z zdi A = P Z AP Z zdi A
propN v
k zdi + βv
V
v=1Nvk zdi + βv
sdot N kd zdi + αk
2
After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows
ϕ vk = N v
k + βv
V
v=1Nvk + βv
θ kd = N k
d + αk
K
k=1Nkd + αk
3
where ϕ vk is the probability of clinical activity v which
belongs to topic k and θ kd is the probability of the dth clinical
day which contains topic k
33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice
In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to
Table 2 Meanings of the notations
Notation Meaning
D The set of clinical days
A The set of clinical activities
Z The set of topics assigned to A
D K The number of clinical days and topics
V The number of unique clinical activities
Nd The number of clinical activities in dth clinical day
NkThe number of clinical activities that are assigned
topic k
GdThe number of groups (unique clinical activities)
in dth clinical day
Agd
The number of clinical activities in gth group indth clinical day
adi The ith clinical activities in dth clinical day
zdi The topic for adiagd The clinical activities of gth group in dth clinical day
agdj The jth clinical activity in gth group in dth clinical day
zgdj The topic for agdj
zgdzgdj
Agd
j=1 the set of topics for gth group in dth
clinical day
N kd
The number of clinical activities that are assignedtopic k in dth clinical day
N vk
The number of clinical activities with value v and topic k
N kd zgd
The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day
N vk zgd
The number of clinical activities with value v andtopic k except gth group in dth clinical day
α β Dirichlet prior vector
ϕ θ Multinomial distribution over clinical activitiesand topics
ϕkMultinomial distribution over clinical activities for
topic k
θd Multinomial distribution over topics for dth clinical day
훼
훽
휃d
k
zdi
adi
i
d
kisin[1 K] [1 Nd][1 D]
isin
isin
d
k
d1
d1
d
k [1 K]
d2
d2
ds
ds
s= d
g g g
g g g
g [1Gd][1D]
isin isin
isin
(a) (b)
Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA
5Journal of Healthcare Engineering
take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg
d is imposed to express thetopic assignment constraint as follows
C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg
d
0 otherwise4
For each group Cgd would equal to 1 only when all the
clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg
d in the joint distribution of LDA (1) as follows
Pprime Z A = 1Δ P Z A prod
dgC zgd 5
where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we
sample a topic for a group based on its posteriorPprime zgd = k Z zgd
A where zgd = k represents the set zgd which
has the same topic k (the notations are explained in Table 2)
Pprime zgd = k Z zgd A = P Z A
P Z zgd A
propΓ N k
d + αk
Γ N kd zgd
+ αk
sdotΓ N
αgd
k + βαgd
Γ Nk +V
vβv
Γ Nαgd
k zgd+ βα
gd
Γ Nk zgd+V
vβv
=Γ N k
d zgd+ Ag
d + αk
Γ N kd zgd
+ αk
sdotΓ N
agdk zgd
+ Agd + βα
gd
Γ Nagd
k zgd+ βα
gd
sdotΓ Nk zgd
+V
vβv
Γ Nk zgd+ Ag
d +V
vβv
= prodAgd
j=1N k
k zgd+ αk + jminus 1
sdotN
αgd
k zgd+ βα
gd+ jminus 1
Nk zgd+V
vβv + jminus 1
6
34 Topic Correlation Limitation The ranking of a clinical
activity v in a topic k is determined by ϕ vk From (3) it is
observed thatN vk is critical to the calculation of ϕ v
k In addi-
tion the sum of N vk among all topics is a fixed value which
equals to the frequency of clinical activities v (denoted as
N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals
We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects
(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics
(2) Associate the calculation of ϕ vk with the informative-
ness of v
We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking
IDF v = log Dd isin 1D v isinDd
7
Accordingly the new calculation of ϕ vk is denoted
as follows
ϕvk = ϕ v
k sdot IDF v 8
The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1
The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in
RV k v = N vk
rank k v 9
where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator
N vk reflects the relevant degree of topic k to clinical activity
6 Journal of Healthcare Engineering
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
adaptive process is needed for improving the performance ofCPs They proposed a graph mining technique to extract thetime dependency pattern of CPs for curing brain stroke Thepattern can be used for predicting the paths for a newpatient In [8] hidden Markov model (HMM) was adoptedfor inferring the CP pattern which was useful in knowledgesharing and CP updating While these methods wereseverely limited to the data deficiency and complexity andbecause CPs focus on the order relations between variousclinical events process mining technologies are widely usedfor CPM A case study on a group of 627 gynecologicaloncology patients was presented in [9] by using heuristicsminer [10] which resulted in a spaghetti-like process modelLang et al [11] evaluated the performance of seven tradi-tional process mining methods on clinical data The resultsdemonstrated that these methods can hardly obtain compre-hensive process models on clinical scenario
Researchers attempted to adopt frequency-basedmethodsto improve the ability of analyzing the complex processmodels In [12] the patient traces with similar schemasand outcomes are grouped together In [13] sequenceclustering was used to identify regular behavior processvariants and infrequent behavior on the process modeldiscovered by process mining method Zhang et al [14] pro-posed a frequency-based method to group the patients intoseveral predefined core dimensions Lakshmanan et al [15]presented a CPM framework that combined clusteringprocess mining and frequent pattern mining technologiesto mine an outcome-related process model In [16] asegmentation method was proposed to discover the frequentbehavior patterns with different time intervals CareExplorer[17] was a novel CP management tool which combinedfrequent sequence mining techniques with advanced visu-alization supports Manually defining a set of concernedand important clinical activities can significantly improvethe interpretability of the process models [18ndash20] How-ever manual works are always subjective and expensivefor CPM
Considering the complexity and quantity of clinical datatopic modeling technologies are used to discover the clinicalpatterns which are useful in CP (re)design and abnormaldetection Huang et al [2] firstly introduced LDA to extractlatent topics as the treatment patterns from clinical dataEach patient trace was regarded as a mixture of differenttreatment patterns In addition the team integrated timestamps [3] patient features [4] and comorbidities [5] intoLDA to enhance the ability of summarizing patterns How-ever treating a patient trace as a whole can hardly representthe feature of CP that the treatment process is consisted ofseveral stages with different clinical goals To deal with thisproblem our previous work [6] puts the emphasis on dis-covering the clinical goals from clinical data by LDA andderiving the order relations between these goals by processmining method It was effective in generating a conciseordinal and interpretable process model as CP While asdiscussed in [21] the performance of this approach isquite limited to the effectiveness of LDA
In this paper we extend the work in [21] to enhance theability of discovering quality clinical goals by incorporating
the topic assignment constraint and topic correlation limita-tion into LDA
3 Methodology
We first introduce the conversion from billing data to theword-count format for LDA Then we present a brief reviewof applying LDA on clinical data After analyzing the twoshortages of the raw LDA we incorporate topic assignmentconstraint and topic correlation limitation into LDA toimprove the performance of discovering latent topics fromclinical data
31 Data Preparation Billing data is the most commondata recorded in hospital information system (an exam-ple is shown in the upper-left part of Figure 2) Fourbasic attributions are essential for the topic modelingtrace ID (the identifier for one patient trace) itemname amount and occurrence time (in the unit ofday) Each billing item (one row in the table) representsa collection of clinical activities such as the first rowltA 3gt means there are three clinical activity A Notethat the amount of different kinds of items should benormalized into the same scale first We denote theclinical activity domain as the vocabulary The billingitems with the same trace ID and occurrence timeconstitute one clinical day With respect to LDA theclinical day with clinical activities corresponds to thedocument with words
32 Brief Introduction for LDA LDA is one of the mostpopular statistical topic modeling technologies It modelsthe generative process of each word from each documentin a text dataset There are two core model parametersin LDA a corpus-level distribution over words for eachtopic and a document-level distribution over topics foreach document In clinical data by regarding the clinicalday and clinical activity as the document and wordrespectively the generative process of LDA has great con-formity with the two abovementioned clinical practicesThe graphical representation is shown in Figure 3(a) andthe notations are explained in Table 2 The generative processof LDA is as follows
(1) Draw distribution ϕksimDir β k = 1 2hellip K(2) For dth clinical day d = 1 2hellipD
(a) Draw distribution θdsimDir α
(b) For ith clinical activity in dth clinical dayi = 1 2hellipNd
(i) Draw zdisimMulti θd(ii) Draw adisimMulti ϕzdi
According to the graphical model we need an infer-ence to find the parameters which can best match theobserved clinical data Gibbs sampling a special case ofMarkov chain Monte Carlo (MCMC) is the most popular
4 Journal of Healthcare Engineering
inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows
P Z A α β = P Z α sdotP A Z β
= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ
prop prodK
k=1prodD
d=1Γ N k
d + αk sdotprodV
v=1Γ N vk + βv
Γ Nk +V
v=1βv
1where Γ middot is the gamma function
Given the joint distribution the topic assignment foreach clinical activity can be written as follows
P zdi = k Z zdi A = P Z AP Z zdi A
propN v
k zdi + βv
V
v=1Nvk zdi + βv
sdot N kd zdi + αk
2
After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows
ϕ vk = N v
k + βv
V
v=1Nvk + βv
θ kd = N k
d + αk
K
k=1Nkd + αk
3
where ϕ vk is the probability of clinical activity v which
belongs to topic k and θ kd is the probability of the dth clinical
day which contains topic k
33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice
In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to
Table 2 Meanings of the notations
Notation Meaning
D The set of clinical days
A The set of clinical activities
Z The set of topics assigned to A
D K The number of clinical days and topics
V The number of unique clinical activities
Nd The number of clinical activities in dth clinical day
NkThe number of clinical activities that are assigned
topic k
GdThe number of groups (unique clinical activities)
in dth clinical day
Agd
The number of clinical activities in gth group indth clinical day
adi The ith clinical activities in dth clinical day
zdi The topic for adiagd The clinical activities of gth group in dth clinical day
agdj The jth clinical activity in gth group in dth clinical day
zgdj The topic for agdj
zgdzgdj
Agd
j=1 the set of topics for gth group in dth
clinical day
N kd
The number of clinical activities that are assignedtopic k in dth clinical day
N vk
The number of clinical activities with value v and topic k
N kd zgd
The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day
N vk zgd
The number of clinical activities with value v andtopic k except gth group in dth clinical day
α β Dirichlet prior vector
ϕ θ Multinomial distribution over clinical activitiesand topics
ϕkMultinomial distribution over clinical activities for
topic k
θd Multinomial distribution over topics for dth clinical day
훼
훽
휃d
k
zdi
adi
i
d
kisin[1 K] [1 Nd][1 D]
isin
isin
d
k
d1
d1
d
k [1 K]
d2
d2
ds
ds
s= d
g g g
g g g
g [1Gd][1D]
isin isin
isin
(a) (b)
Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA
5Journal of Healthcare Engineering
take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg
d is imposed to express thetopic assignment constraint as follows
C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg
d
0 otherwise4
For each group Cgd would equal to 1 only when all the
clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg
d in the joint distribution of LDA (1) as follows
Pprime Z A = 1Δ P Z A prod
dgC zgd 5
where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we
sample a topic for a group based on its posteriorPprime zgd = k Z zgd
A where zgd = k represents the set zgd which
has the same topic k (the notations are explained in Table 2)
Pprime zgd = k Z zgd A = P Z A
P Z zgd A
propΓ N k
d + αk
Γ N kd zgd
+ αk
sdotΓ N
αgd
k + βαgd
Γ Nk +V
vβv
Γ Nαgd
k zgd+ βα
gd
Γ Nk zgd+V
vβv
=Γ N k
d zgd+ Ag
d + αk
Γ N kd zgd
+ αk
sdotΓ N
agdk zgd
+ Agd + βα
gd
Γ Nagd
k zgd+ βα
gd
sdotΓ Nk zgd
+V
vβv
Γ Nk zgd+ Ag
d +V
vβv
= prodAgd
j=1N k
k zgd+ αk + jminus 1
sdotN
αgd
k zgd+ βα
gd+ jminus 1
Nk zgd+V
vβv + jminus 1
6
34 Topic Correlation Limitation The ranking of a clinical
activity v in a topic k is determined by ϕ vk From (3) it is
observed thatN vk is critical to the calculation of ϕ v
k In addi-
tion the sum of N vk among all topics is a fixed value which
equals to the frequency of clinical activities v (denoted as
N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals
We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects
(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics
(2) Associate the calculation of ϕ vk with the informative-
ness of v
We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking
IDF v = log Dd isin 1D v isinDd
7
Accordingly the new calculation of ϕ vk is denoted
as follows
ϕvk = ϕ v
k sdot IDF v 8
The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1
The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in
RV k v = N vk
rank k v 9
where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator
N vk reflects the relevant degree of topic k to clinical activity
6 Journal of Healthcare Engineering
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
inference method for LDA It is based on the joint dis-tribution of latent topics and observed clinical activitiesas follows
P Z A α β = P Z α sdotP A Z β
= P Z θ P θ α dθ sdot P A Z ϕ P ϕ β dϕ
prop prodK
k=1prodD
d=1Γ N k
d + αk sdotprodV
v=1Γ N vk + βv
Γ Nk +V
v=1βv
1where Γ middot is the gamma function
Given the joint distribution the topic assignment foreach clinical activity can be written as follows
P zdi = k Z zdi A = P Z AP Z zdi A
propN v
k zdi + βv
V
v=1Nvk zdi + βv
sdot N kd zdi + αk
2
After the Gibbs sampling the two important hiddenparameters ϕ and θ are computed as follows
ϕ vk = N v
k + βv
V
v=1Nvk + βv
θ kd = N k
d + αk
K
k=1Nkd + αk
3
where ϕ vk is the probability of clinical activity v which
belongs to topic k and θ kd is the probability of the dth clinical
day which contains topic k
33 Topic Assignment Constraint We can see that theinference procedure is mutually independent for all theclinical activities (as shown in (2)) So that even the sameclinical activities in one clinical day may get different topicswhich violates to the clinical practice
In this section we adopt the chain graph proposed in [21]to incorporate topic assignment constraint into LDA It cancoerce the same clinical activities on one clinical day to
Table 2 Meanings of the notations
Notation Meaning
D The set of clinical days
A The set of clinical activities
Z The set of topics assigned to A
D K The number of clinical days and topics
V The number of unique clinical activities
Nd The number of clinical activities in dth clinical day
NkThe number of clinical activities that are assigned
topic k
GdThe number of groups (unique clinical activities)
in dth clinical day
Agd
The number of clinical activities in gth group indth clinical day
adi The ith clinical activities in dth clinical day
zdi The topic for adiagd The clinical activities of gth group in dth clinical day
agdj The jth clinical activity in gth group in dth clinical day
zgdj The topic for agdj
zgdzgdj
Agd
j=1 the set of topics for gth group in dth
clinical day
N kd
The number of clinical activities that are assignedtopic k in dth clinical day
N vk
The number of clinical activities with value v and topic k
N kd zgd
The number of clinical activities that are assigned topick in dth clinical day except gth group in dth clinical day
N vk zgd
The number of clinical activities with value v andtopic k except gth group in dth clinical day
α β Dirichlet prior vector
ϕ θ Multinomial distribution over clinical activitiesand topics
ϕkMultinomial distribution over clinical activities for
topic k
θd Multinomial distribution over topics for dth clinical day
훼
훽
휃d
k
zdi
adi
i
d
kisin[1 K] [1 Nd][1 D]
isin
isin
d
k
d1
d1
d
k [1 K]
d2
d2
ds
ds
s= d
g g g
g g g
g [1Gd][1D]
isin isin
isin
(a) (b)
Figure 3 Graphical representation of (a) LDA and (b) CDG-LDA
5Journal of Healthcare Engineering
take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg
d is imposed to express thetopic assignment constraint as follows
C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg
d
0 otherwise4
For each group Cgd would equal to 1 only when all the
clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg
d in the joint distribution of LDA (1) as follows
Pprime Z A = 1Δ P Z A prod
dgC zgd 5
where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we
sample a topic for a group based on its posteriorPprime zgd = k Z zgd
A where zgd = k represents the set zgd which
has the same topic k (the notations are explained in Table 2)
Pprime zgd = k Z zgd A = P Z A
P Z zgd A
propΓ N k
d + αk
Γ N kd zgd
+ αk
sdotΓ N
αgd
k + βαgd
Γ Nk +V
vβv
Γ Nαgd
k zgd+ βα
gd
Γ Nk zgd+V
vβv
=Γ N k
d zgd+ Ag
d + αk
Γ N kd zgd
+ αk
sdotΓ N
agdk zgd
+ Agd + βα
gd
Γ Nagd
k zgd+ βα
gd
sdotΓ Nk zgd
+V
vβv
Γ Nk zgd+ Ag
d +V
vβv
= prodAgd
j=1N k
k zgd+ αk + jminus 1
sdotN
αgd
k zgd+ βα
gd+ jminus 1
Nk zgd+V
vβv + jminus 1
6
34 Topic Correlation Limitation The ranking of a clinical
activity v in a topic k is determined by ϕ vk From (3) it is
observed thatN vk is critical to the calculation of ϕ v
k In addi-
tion the sum of N vk among all topics is a fixed value which
equals to the frequency of clinical activities v (denoted as
N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals
We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects
(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics
(2) Associate the calculation of ϕ vk with the informative-
ness of v
We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking
IDF v = log Dd isin 1D v isinDd
7
Accordingly the new calculation of ϕ vk is denoted
as follows
ϕvk = ϕ v
k sdot IDF v 8
The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1
The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in
RV k v = N vk
rank k v 9
where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator
N vk reflects the relevant degree of topic k to clinical activity
6 Journal of Healthcare Engineering
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
take on the same topic The graphical model is shown inFigure 3(b) In each clinical day we first put all the sameclinical activities into a group and then use indirect edgesto connect them A function Cg
d is imposed to express thetopic assignment constraint as follows
C zgd = 1 if zgd1 = zgd2 =⋯ = zgdAg
d
0 otherwise4
For each group Cgd would equal to 1 only when all the
clinical activities in the group get the same topic Our goalis to make all the groups conform to the constraint Sothat we add Cg
d in the joint distribution of LDA (1) as follows
Pprime Z A = 1Δ P Z A prod
dgC zgd 5
where Δ is the normalization constantSimilar to the Gibbs sampling algorithm for LDA we
sample a topic for a group based on its posteriorPprime zgd = k Z zgd
A where zgd = k represents the set zgd which
has the same topic k (the notations are explained in Table 2)
Pprime zgd = k Z zgd A = P Z A
P Z zgd A
propΓ N k
d + αk
Γ N kd zgd
+ αk
sdotΓ N
αgd
k + βαgd
Γ Nk +V
vβv
Γ Nαgd
k zgd+ βα
gd
Γ Nk zgd+V
vβv
=Γ N k
d zgd+ Ag
d + αk
Γ N kd zgd
+ αk
sdotΓ N
agdk zgd
+ Agd + βα
gd
Γ Nagd
k zgd+ βα
gd
sdotΓ Nk zgd
+V
vβv
Γ Nk zgd+ Ag
d +V
vβv
= prodAgd
j=1N k
k zgd+ αk + jminus 1
sdotN
αgd
k zgd+ βα
gd+ jminus 1
Nk zgd+V
vβv + jminus 1
6
34 Topic Correlation Limitation The ranking of a clinical
activity v in a topic k is determined by ϕ vk From (3) it is
observed thatN vk is critical to the calculation of ϕ v
k In addi-
tion the sum of N vk among all topics is a fixed value which
equals to the frequency of clinical activities v (denoted as
N v ) in all clinical days Due to the independence of theinference procedure of LDA the frequency of v may be uni-formly allocated to various topics Thus a high-frequencyclinical activity is much easier to rank high in many topicswhile a low-frequency clinical activity can hardly rank highin even one topic It results in a large redundancy betweendifferent topics so that each topic is lacking of characteristicsIt is hard to regard these topics as the clinical goals
We consider using informativeness feature to solve thisproblem Our insight is that for an informative clinicalactivity it should rank high in a small number of topicswhich means that it can better represent these clinical goalsthan other uninformative clinical activities We incorporatetopic correlation limitation into LDA to achieve this targetThis limitation contains two aspects
(1) Restrict the distribution of N v over topics Higherinformativeness and less relevant topics
(2) Associate the calculation of ϕ vk with the informative-
ness of v
We use inverse document frequency (IDF see (7)) tomeasure the informativeness of each clinical activity Ingeneral informative clinical activities are expected to havehigher IDF and hence less relevant topics and higher ranking
IDF v = log Dd isin 1D v isinDd
7
Accordingly the new calculation of ϕ vk is denoted
as follows
ϕvk = ϕ v
k sdot IDF v 8
The core principle for the first aspect is to keep discardingthe most irrelevant topic for a clinical activity It is achievedby maintaining a relevant topic list for each clinical activityduring the iteration of Gibbs sampling The pseudocode forthis procedure is shown in Algorithm 1
The topic number upper bounder refers to the maximumnumber of the topics a clinical activity can correlate to Forexample given a clinical activity ECG and its topic numberupper bounder s ECG = 3 its N v can be only allocated tothree topics We set s v from 1 to K according to its IDFThen for each clinical activity value v we maintain acorresponding relevant topic list κ v In every δ iterationwe check whether the size of κ v has been reduced to thedesired number s v (line 12) To the clinical activities whichstill need to be restricted we will update their lists (line 13) bydiscarding the topic with minimum relevant value (RV) asdefined in
RV k v = N vk
rank k v 9
where rank k v is the ranking of clinical activity v in topic kbased on the multinomial distribution ϕk The numerator
N vk reflects the relevant degree of topic k to clinical activity
6 Journal of Healthcare Engineering
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
v And the denominator rank k v reflects the importance ofclinical activity v to topic k Thus higher RV represent morecorrelation between the clinical activity and the topic
By keeping on updating the relevant topic lists wegradually reduce the dimension of topic selection to thedesired range For instance suppose there are K = 5 topicsand a clinical activity ECG with a RV vector 50 150 5100 60 and s ECG = 3 the relevant topic list k ECGwould be changed from 1 2 3 4 5 to 1 2 4 5 in the nextupdating procedure As seen in line 15 of Algorithm 1 we dothe topic sampling from relevant topic list instead of all thetopics To the preinstance it means that we would sample atopic from topics 1 2 4 and 5 without topic 3 because ofits minimum RV
The complexity of deriving a sample zgd is K where K is the number of topics Therefore the overall com-plexity is KδGK = K2δG δ is the number of iterationfor updating relevant topic list and G =sumD
d Gd In practice Kwould not be a large value and δ could be treated as a con-stant so that G is the key factor that contributes to thecomplexity It means that the proposed method scales lin-early as we increase the total number of groups
35 Process Mining Instead of the detailed and complexclinical activities we use the discovered topics to representeach clinical day Hence each patient can be regarded as atopic-based sequence We adopt the process mining frame-work proposed in [6] to derive a concise and interpretableprocess model
4 Experiments
In this section we conducted a series of experiments todemonstrate the effectiveness of the proposed methods indiscovering quality CP model We begin with the descriptionof experimental settings
41 Experimental Settings To evaluate both the suitabilityand generality of our approach we collected two real-world billing datasets from a city-centre hospital Thetwo datasets are about two different diseases an internaldiseasemdashintracerebral hemorrhage (ICH) and a surgicaldiseasemdashinguinal hernia (IH) The detailed statistics aresummarized in Table 3 We set the Dirichlet prior toα = 1 0 and β = 0 01
The topic number K is determined by a trade-off strategyproposed in [6] It attempt to find a balance between theperplexity of topic modeling and the topic label size Theformer one is commonly used in evaluating the topic modelsrsquoprediction ability It is defined as the reciprocal geometricmean of the likelihood of the data Lower perplexity refersto better predictiveness and hence a better K while perplexityis not strongly correlated to human judgement [22] In ourapproach the size of topic label is critical to the concisenessof the final process model High topic number K wouldsignificantly reduce the interpretability of the model Thuswe select the intersection point of the two metrics acrossvarious K as the optimal topic number (K = 8 for ICH andK = 5 for IH) The iterations for updating relevant topic listδ are set to 300
42 Topic Quality The latent topics discovered by topicmodeling are interpreted as the clinical goals in ourapproach To evaluate the quality of the topics we com-pare our approach (CDG-LDA) against LDA [6] in threeaspects coherence redundancy and coverage Tables 4and 5 show the clustering results of the two methodsFor each topic we listed the top N = 10 ranked clinicalactivities and asked doctors to label it The similar topicsgenerated by different methods are in the same row Notethat given a top size N we defined (total top list) as allthe top N-ranked clinical activities among K topics Forexample refers to the 80 clinical activities for ICHwhen N = 10 and K = 8
421 Coherence We expect that the clinical activities in atopic could represent the similar clinical goal Take topic 5(the 5th row in Table 5) as the example It is observed thatthe result of LDA contains both surgery- and nursing care-related activities While in the result of CDG-LDA most ofthe clinical activities are about surgery The similar situations
Table 3 Statistics of our datasets
DiseaseTracenumber
Daynumber
Vocnumber
AvgLOS
MinLOS
MaxLOS
ICH 240 3204 752 14 2 34
IH 33 241 447 6 2 10
Input A dataset of D clinical daysTopic number K Hyper-parameters α and βIterations δ for updating relevant topic list
Output The multinomial distribution ϕ and θ1 Calculate IDF for each clinical activity2 Initialize the topic number upper bounder s v isin 1 K
for clinical activity with value v according to its IDF3 Initialize the relevant topic list κ v = 1 2hellip K for
clinical activity with value v4 Sample zgd for each group and Initialize all the count
parameters NkNkd and N v
k accordingly
5 for l = 1 to K do6 if l gt 1 then7 for v = 1 to V do8 if K minus l + 1 ge s v then9 Update κ v 10 for r = 1 to δ do11 for d = 1 to D do12 for g = 1 to Gd do13 k = zgd 14 Nk minus = Ag
d Nkd minus = Ag
d Nvk minus = Ag
d 15 Sample topic index kprime from κ v
according to (6)
16 Nkprime + = Agd N
kprimed + = Ag
d Nv
kprime + = Agd
17 Calculate and return ϕ (based on 8) and β
Algorithm 1 Gibbs sampling of CDG-LDA
7Journal of Healthcare Engineering
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Table 4 Eight labeled topics of ICH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1 Imaging
(1) Doppler echocardiography (2) Dopplerultrasonography of left ventricular function(3) ventricular wall motion (4) transcranialDoppler (5) analysis of ultrasonic (6) color
Doppler ultrasonography of abdomen(7) X-radiography (8) digitized
photography (9) B mode ultrasoundand (10) film
Imaging
(1) Doppler echocardiography(2) Doppler ultrasonography of
left ventricular function (3) ventricularwall motion (4) transcranial Doppler(5) analysis of ultrasonic (6) color
Doppler ultrasonography of the abdomen(7) X-radiography (8) B mode ultrasound(9) digitized photography and (10) film
2 Medication
(1) Aceglutamide (2) sodium chloride(3) local infiltration anesthesia (4) etimicin(5) cerebrosidekinin (6) physical cooling(7) t-branch pipe (8) aerosol inhalation(9) ambroxol and (10) venous transfusion
Medication
(1) Pantoprazole (2) potassium magnesiumaspartate (3) mannitol (4) vitamin B6(5) glycerin fructose (6) vitamin C
(7) naloxone (8) piracetam (9) sodiumchloride and (10) glucose
3High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) urinary meatus care (5) mouth care(6) skin care (7) hospital examining fee(8) mouth care package (9) ambulatory
blood pressure monitoring and(10) continuous oxygen inhalation
High-levelnursing care
(1) Level I care (2) ECG monitoring(3) blood oxygen saturation monitoring(4) mouth care (5) urinary meatus care(6) skin care (7) continuous oxygeninhalation (8) mouth care package
(9) ambulatory blood pressure monitoringand (10) drainage pack
4Regular nursing
care andmedication
(1) Level II care (2) hospital examining fee(3) pantoprazole (4) vitamin B6
(5) mannitol (6) vitamin C (7) potassiummagnesium aspartate (8) venous transfusion
(9) sodium chloride and (10) glycerinfructose
Regularnursing care
(1) Level II care (2) venous transfusion(3) flusher (4) syringe (5) arterialvenous
cannula (6) therapy application(7) aceglutamide (8) intramuscular
injection (9) physical cooling venipunctureand (10) nifedipine
5Admissionexamination
(1) ECG (2) PLGA (3) fibrous protein(4) blood collection tube (5) hemostix
(6) washbasin (7) ECG event log(8) plasma prothrombin time assay
(9) activated partial thromboplastin timeand (10) prothrombin time assay
Admissionexamination
(1) Brain CT (2) venous sampling (3) ECG(4) hemostix (5) electrode (6) 12 channeldynamic electrocardiogram (7) washbasin(8) ECG event log (9) nasogastric and(10) evaluation of activities of daily living
6Biochemical
exam
(1) Urea assay (2) creatinine assay (3) uricacid assay (4) chloride assay (5) blood
cell analysis (6) potassium assay (7) sodiumassay (8) glucose assay (9) serum
bicarbonate assay and (10) excrement
Biochemicalexam
(1) Creatinine assay (2) urea assay(3) uric acid assay (4) blood cell
analysis (5) chloride assay (6) sodiumassay (7) potassium assay (8) serumbicarbonate assay (9) glucose assay
and (10) calcium assay
7Biochemical
exam
(1) Serum a-L-glucosidase assay (2) serum5primenucleotidase assay (3) plasma viscosityassay (4) glycosylated hemoglobin assay(5) whole blood viscosity (6) identification
of Rh blood group antigen (7) amylase assay(8) serum creatine kinase-MB isoenzyme
activity assay (9) lactate dehydrogenase assayand (10) serum alanine aminotransferase assay
Biochemicalexam
(1) Serum 5primenucleotidase assay (2) seruma-L-glucosidase assay (3) plasma viscosity
assay (4) serum creatine kinase-MBisoenzyme activity assay (5) glycosylated
hemoglobin assay (6) whole blood viscosity(7) identification of Rh blood group antigen(8) amylase assay (9) lactate dehydrogenaseassay and (10) serum total protein assay
8Biochemical
exam
(1) Serum aspartate aminotransferase assay(2) electrophoresis analysis of lactatedehydrogenase isoenzymes (3) urine
analysis (4) serum 7 pancreaticacyltransferase assay (5) serum alkalinephosphatase assay (6) serum alanineaminotransferase assay (7) urinalysis
(8) hepatitis A antibody assay (9) serumhigh-density lipoprotein cholesterol assayand (10) serum total cholesterol assay
Biochemicalexam
(1) Serum aspartate aminotransferase assay(2) serum albumin assay (3) urine sediment
quantitative analyze (4) inorganicphosphorus assay (5) electrophoresisanalysis of lactate dehydrogenase
isoenzymes (6) serum fructose amine assay(7) urine analysis (8) serum 7 pancreaticacyltransferase assay (9) serum alanineaminotransferase assay and (10) serum
alkaline phosphatase assay
8 Journal of Healthcare Engineering
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
can be found in 2nd 4th and 5th rows of ICH and 2nd row ofIH It shows that the result of approach has significant highercoherent degree than LDA This may be due to the high co-occurrence of these different kinds of clinical activities Forexample the nursing care activities are usually used withsome medication activities in one clinical day and they allhave high-frequency in patientsrsquo treatment Based on themutually independent topic inference procedure of LDA itis possible to assign the same topic to parts of the two kindsof activities in one clinical day such as 60 of nursing careactivities and 50 of medication activities got the same topicand the other activities got different topics By introducingthe topic assignment constraint we guarantee that the twokinds of activities in one clinical day would be assigned eitherthe same topic or different topics It can improve the consis-tency of the topic assignment which is more conformed tothe clinical practice
We proposed a user study to quantitatively measure thecoherence degree of different methods Three doctors weretrained with a short tutorial and invited to label the top 20clinical activities of each topic to very relevant (score 2)relevant (score 1) or irrelevant (score 0) The final score isdetermined by a majority voting strategy The kappa valuewhich is used to measure the interrater agreement is 073for ICH and 074 for IH We used NKQMN [23] as themetric to measure the coherence degree
NKQM N= 1KK
k=1
N
j=1 score Mkj log j + 1ZN
10
where K is the topic number Mk j is the jth clinical activitygenerated by methodM for topic k and ZN is the normaliza-tion factor Higher value of NKQMN implies better rankingresult As shown in Table 6 the performance of CDG-LDA are remarkably better than LDA across various depthof NKQM
422 Redundancy Given two discovered topics we expectthat they would not look alike which means that the twotopics should contain few same clinical activities Thus theredundancy indicator focuses on the distinctive characteristic
Table 5 Five labeled topics of IH by different methods
Number Topic tag LDA Topic tag CDG-LDA
1Biochemical
exam
(1) Blood cell analysis (2) glucose assay(3) creatinine assay (4) urea assay(5) chloride assay (6) sodium assay
(7) potassium assay (8) uric acid assay(9) calcium assay and (10) magnesium assay
Biochemicalexam
(1) Blood cell analysis (2) glucose assay(3) sodium assay (4) chloride assay
(5) potassium assay (6) uric acid assay(7) creatinine assay (8) urea assay
(9) calcium assay and (10) magnesium assay
2Infectious
disease exam
(1) Hepatitis A antibody assay (2) treponemapallidum antibody assay (3) HIV antibodyassay (4) hepatitis B core antibody assay
(5) hepatitis Be antibody assay (6) hepatitis Cantibody assay (7) hepatitis B surfaceantibody assay (8) hepatitis B surfaceantigen assay (9) urine analysis and(10) urinary sediment microscopy
Presurgicalpreparation
(1) Skin preparation package (2) skinpreparation (3) disinfection fee (4) hygienicmaterial (5) washbasin (6) intramuscularinjection (7) health education (8) BP
monitoring (9) cefuroxime and(10) infusion apparatus
3Admission
exam
(1) ECG (2) ECG monitoring (3) ABOsubtype assay (4) urine routines
(5) prothrombin time assay (6) activatedpartial thromboplastin time (7) plasmaprothrombin time assay (8) washbasin
(9) Rh subtype assay and (10) color Dopplerultrasonography of the abdomen
Admissionexam
(1) ECG (2) ECG monitoring (3) urineroutines (4) urinary sediment microscopy
(5) ECG event log (6) abdomen X-ray (7) chestX-ray (8) color Doppler ultrasonographyof superficial organ (9) color Dopplerultrasonography of abdomen and
(10) analysis of ultrasonic
4Regular
nursing care
(1) Level II care (2) hospital examining fee(3) level III care (4) infusion apparatus(5) venous transfusion (6) hemostix
(7) syringe (8) special hemostix (9) dressingchange and (10) arterialvenous cannula
Regularnursing care
(1) Level II care (2) level III care (3) dressingchange (4) fat-soluble vitamin (5) cefotiam
(6) arterialvenous cannula (7) venoustransfusion (8) small dressing change
(9) levofloxacin and (10) middle dressing change
5Surgery
and regularnursing care
(1) Level II care (2) endotherm knife(3) mask (4) venous transfusion (5) hospital
examining fee (6) glucose (7) syringe(8) ECG monitoring (9) blood oxygensaturation and (10) sodium chloride
Surgery
(1) Mask (2) endotherm knife (3) surgerypackage (4) blood oxygen saturation (5) lumbaranesthesia (6) epidural anesthesia (7) application
(8) oxygen therapy (9) IH repair and(10) anesthesia monitoring
Table 6 Topic coherence in terms of NKQMN
Dataset Method NKQM5 NKQM10 NKQM20
ICHLDA 08211 08004 07907
CDG-LDA 08448 08498 08235
IHLDA 07915 07830 07631
CDG-LDA 08316 08266 08116
9Journal of Healthcare Engineering
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
of each topic We measure the redundancy (RE see (11)) bycalculating the overlapping degree between all the topics
RE =visinΩ count v minus 1 11
whereΩ is the unique clinical activities in and count vis the number of the occurrence of clinical activity v in the
(top N clinical activities per topic K topics) In the pre-vious example in Figure 1(b) the RE value among all thethree topics is 2 (suppose N gt 3) Lower RE represents lessredundancy and hence a better clustering result
Figure 4 shows the comparison between CDG-LDA andLDA It is observed that CDG-LDA consistently outperformsLDA across various topic number K and top size N This isdue to the incorporation of topic correlation limitation Byupdating the relevant topic list during the iterations of Gibbssampling each clinical activity would strongly correlate tolimited topics especially the informative clinical activitiesAs a sequence a clinical activity would not rank high in manytopics which lead to a low redundancy degree among allthe topics
423 Coverage Coverage refers to the ability of discoveringimportant clinical activities for a specific disease Giventwo high coherent and low redundant topics (syringeinfusion apparatus and arterialvenous cannula hemostix)for ICH both of them are about the basic clinical equip-ment However these clinical activities are not criticalenough for the treatment of ICH We aim to find out allthe key clinical activities in We used the necessaryrecommendatory clinical behaviors in the Chinese NationalClinical Pathway and ChineseAHAASAEHS guidelineas our benchmark The overlapping degree betweenand the benchmark is adopted as the coverage indicatorHere we took the listed in Tables 4 and 5 whichmeans the top size N = 10 to analyze the coverage fromthree aspects
(i) Examination
For ICH the core examinations contains bloodurineroutine ECG and biochemical and imaging examinationsFor the first three both LDA and CDG-LDA have discoveredmajority of the related clinical activities in topics 5 6 7 and8 While for the last one LDA failed to find out brain CTwhich is the most effective imaging examination for ICHIn CDG-LDA brain CT ranked high in topic 5
For IH a patient needs the similar examinations toICH except the imaging test In topics 1 and 3 of thetwo methods we can find the clinical activities aboutbloodurine routine ECG and biochemical examinationsAn interesting observation is that the topics in the 2ndrow got by LDA and CDG-LDA are quite different InLDA the activities in topic 1 are mainly about the infectiousdisease assay which is an important part of biochemicalexamination While topic 2 of CDG-LDA focuses on thepresurgical preparation Although LDA got better perfor-mance in infectious disease examinations we deem thatthe result of CDG-LDA is more suitable for the clusteringof IH clinical behaviors There are two reasons (1) Theinfectious disease assay-related activities can be found inthe top 20 of topic 1 in CDG-LDA They all belonged tothe biochemical examinations which will always be pre-scribed together in practice so that it is reasonable toput them in the same topic (2) Topic 2 about presurgicalpreparation is of great importance for IH We would detailthis in the following part
(ii) Treatment
Due to the ICH dataset is extracted from the NeurologyDepartment rather than the Neurosurgery Department thetreatment activities are mainly medical instead of surgicalAccording to the drug function the commonly usedmedication for ICH can be divided into four categoriesreducing ICP (dehydration) keeping electrolyte balance
3 7 11 15 19 3 7 11 15 19
0
20
40
60
80
100
ICH
0
50
100
150
200IH
LDA (N = 10) N = 10)LDA (N = 20) N = 20)
CDG-LDA (CDG-LDA (
Figure 4 Redundancy indicator RE on various topic number K and top size N
10 Journal of Healthcare Engineering
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
controlling blood pressure and preventing complications(like dysphagia and aspiration nervous system injurybacterial infection etc) Most of them have been coveredby both LDA and CDG-LDA However LDA missed animportant dehydration drug piracetam which is of highfrequency in the dataset
IH is a typical surgical disease so that the treatmentactivities are mainly about the surgery It is observed thatboth LDA and CDG-LDA contained the surgery-relatedclinical activities in topic 5 However LDA failed to discoverthe most critical surgical activity IH repair and anesthesia inthe top 10 of topic 5 This is caused by the high redundancybetween topics 4 and 5 of the LDA It made the ranking ofthese important surgical activities lower than the nursingcare activities which are of high-frequency Moreover aswe discussed before few of presurgical activities have beenfound by LDA By limiting the topic number of eachclinical activity and adjusting the day-activity distributioncalculation CDG-LDA successfully discovered these clinicalactivities in topic 4
(iii) Nursing care
In both ICH and IH datasets the frequency of the clinicalactivities about nursing care are always the highest Hencethe two topic modeling methods got good performance indiscovering this kind of activities including different levelnursing care and various nursing equipment
(1) Discussion of Topic Quality From the abovementionedthree aspects about coherence redundancy and coveragewe can tell that CDG-LDA demonstrates significantly betterperformance than the original LDA in discovering latenttopics from clinical data The extracted high-quality topicscan be suitably interpreted as the clinical goals
43 Process Model Demonstration By using a topic label(contains several topics) to represent each clinical day we
got a series of topic-based sequences In this section wedemonstrate the process models derived from thesesequences by using the process mining methods proposedin [6] Note that we combined the topics about biochemicalexaminations which are always used together for the sakeof discussion Figures 5 and 6 show the result model of ICHand IH respectively We compared them with the ChineseNational Clinical Pathways
(i) ICH
The process can be easily divided into three stages
(1) Admission (topic (5) topic (6 7 and 8)) the patientwould accept a series of examinations for making adefinite diagnosis including brain CT ECG and bio-chemical assays Note that imaging examinations arenot required for all the traces This is due to the factthat the clinical activities in our imaging topic (topic(1)) such as CTA and MRA are mainly used as theauxiliary means for the diagnosis of ICH Usuallythe patients with severe symptoms or comorbiditiesneed further imaging examinations
(2) Treatment (topic (3) topic (2 3) topic (4) and topic(2)) After various examinations the ICH patientswould receive nursing care and take medicationsfor recovery Because of the urgency of ICH drugsand high-level nursing care are firstly used torelieve and monitor the symptoms When symptomsbecome stable the nursing level may be changed toregular As an internal department various medica-tions are used for ICH treatment according to thepatientsrsquo symptoms
(3) Re-examination (topic (5) topic (2 4 and 5) andtopic (1)) During the treatment stage related exam-inations are repeatedly needed for confirming thepatientsrsquo states
5 6 7 8 3
2 3
2
4
1 2 4 5
Figure 5 ICH process model
3 1
2
5 2
4
Figure 6 IH process model
11Journal of Healthcare Engineering
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
(ii) IH
Similar to ICH the IH patients would firstly receivevarious examinations Then a series of presurgical activi-ties are used for them Note that some patients wouldhave surgery in the same clinical day to the presurgicalpreparations (topic (5 2)) while others may accept sur-gery in the next clinical day (topic (2)) The average levelof nursing care for IH is lower than ICH And the anti-bacterial drugs are used during the surgery and nursingcare period It is worth mentioning that the majority ofIH patients have undergone the surgical treatment whileit still existed a small number of IH patients have notreceived surgery By careful analysis we found that thesepatients are mainly the infants or the elders who are notappropriate for surgery
(1) Discussion of Process Model It is observed that the topic-based clinical process model is of great conciseness andinterpretability for CPM We can conveniently draw theexecution CP from the clinical historical data and identifythe characteristics compared to expert-designed CP
5 Conclusion
In this paper we proposed a novel topic modeling approachto discover latent topics as the clinical goals for CPM Thekey idea is to incorporate clinical practice into the generativeprocess of LDA to improve the topic quality Topic assign-ment constraint restricts that all the same clinical activitiesin one clinical day should be assigned the same topic Topiccorrelation limitation incorporates informativeness featureto adaptively adjust the ranking of the clinical activities in eachtopic Process mining methods are used on these topic-basedsequences instead of the detailed clinical activities Theexperimental results show that our approach outperformsthe original LDA in coherence redundancy and coverageAnd the resultant topic-based process models are of greatconciseness and interpretability for CP representation
One extension of this work is to find a more suitablemeasure as the informativeness indicator to improve theranking adjustment strategy
Conflicts of Interest
The authors declare that there is no conflict of interestregarding the publication of this paper
Acknowledgments
This work was supported by the National Key TechnologyRampD Program (No 2015BAH14F02) and Project 61325008(Mining and Management of Large Scale Process Data)which is supported by the NSFC
References
[1] D M Blei A Y Ng and M I Jordan ldquoLatent Dirichletallocationrdquo The Journal of Machine Learning Research vol 3no 1 pp 993ndash1022 2003
[2] Z Huang X Lu and H Duan ldquoLatent treatment patterndiscovery for clinical processesrdquo Journal of Medical Systemsvol 37 no 2 pp 1ndash10 2013
[3] Z Huang W Dong L Ji C Gan X Lu and H DuanldquoDiscovery of clinical pathway patterns from event logs usingprobabilistic topic modelsrdquo Journal of Biomedical Informaticsvol 47 no 1 pp 39ndash57 2014
[4] Z Huang W Dong P Bath L Ji and H Duan ldquoOn mininglatent treatment patterns from electronic medical recordsrdquoData Mining and Knowledge Discovery vol 40 no 1pp 1ndash36 2014
[5] Z Huang W Dong L Ji C He and H Duan ldquoIncorporatingcomorbidities into latent treatment pattern mining for clin-ical pathwaysrdquo Journal of Biomedical Informatics vol 59no 1 pp 227ndash239 2016
[6] X Xu T Jin Z Wei C Lv and J Wang ldquoTCPM topic-basedclinical pathway miningrdquo in 2016 IEEE First InternationalConference on Connected Health Applications Systems andEngineering Technologies (CHASE) pp 292ndash301 IEEEWashington DC USA June 2016
[7] F-r Lin S-c Chou S-m Pan and Y-m Chen ldquoMining timedependency patterns in clinical pathwaysrdquo InternationalJournal of Medical Informatics vol 62 no 1 pp 11ndash25 2001
[8] F-r Lin L-s Hsieh and S-m Pan ldquoLearning clinical path-way patterns by hidden Markov modelrdquo in Proceedings ofthe 38th Annual Hawaii International Conference on SystemSciences 2005 (HICSSrsquo05) IEEE p 142a January 2005
[9] R S Mans M Schonenberg M Song W M van derAalst and P J Bakker ldquoApplication of Process Mining inHealthcaremdashA Case Study in a Dutch Hospitalrdquo BiomedicalEngineering Systems and Technologies vol 25 no 1pp 425ndash438 2009
[10] A Weijters and W M Van der Aalst ldquoRediscoveringworkflow models from event-based data using little thumbrdquoIntegrated Computer-Aided Engineering vol 10 no 2pp 151ndash162 2003
[11] M Lang T Buumlrkle S Laumann and H-U Prokosch ldquoProcessmining for clinical workflows challenges and currentlimitationsrdquo in E-Health Beyond the Horizon Get IT ThereProceedings of MIE2008 the XXIst International Congress ofthe European Federation for Medical Informatics IOS Pressp 229 2008
[12] J Ghattas M Peleg P Soffer and Y Denekamp ldquoLearning thecontext of a clinical processrdquo in Business Process ManagementWorkshops pp 545ndash556 Springer Berlin Heidelberg 2009
[13] A Rebuge and D R Ferreira ldquoBusiness process analysis inhealthcare environments a methodology based on processminingrdquo Information Systems vol 37 no 2 pp 99ndash116 2012
[14] Y Zhang R Padman and N Patel ldquoPaving the cowpathlearning and visualizing clinical pathways from electronichealth record datardquo Journal of Biomedical Informaticsvol 58 no 1 pp 186ndash197 2015
[15] G T Lakshmanan S Rozsnyai and F Wang ldquoInvestigatingclinical care pathways correlated with outcomesrdquo in Busi-ness Process Management pp 323ndash338 Springer BerlinHeidelberg 2013
[16] Z Huang X Lu H Duan and W Fan ldquoSummarizing clinicalpathways from event logsrdquo Journal of Biomedical Informaticsvol 46 no 1 pp 111ndash127 2013
[17] A Perer F Wang and J Hu ldquoMining and exploringcare pathways from electronic medical records with
12 Journal of Healthcare Engineering
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
visual analyticsrdquo Journal of Biomedical Informatics vol 56no 1 pp 369ndash378 2015
[18] U Kaymak R Mans T van de Steeg and M Dierks ldquoOnprocess mining in health carerdquo in 2012 IEEE InternationalConference on Systems Man and Cybernetics (SMC)pp 1859ndash1864 IEEE South Korea October 2012
[19] C Fernandez-Llatas A Martinez-Millana A Martinez-Romero J M Benedi and V Traver ldquoDiabetes care relatedprocess modelling using process mining techniques Lessonslearned in the application of interactive pattern recognitioncoping with the spaghetti effectrdquo in Engineering in Medicineand Biology Society (EMBC) 2015 37th Annual InternationalConference of the IEEE pp 2127ndash2130 IEEE Italy August2015
[20] A Dagliati L Sacchi C Cerra et al ldquoTemporal data miningand process mining techniques to identify cardiovascularrisk-associated clinical pathways in type 2 diabetes patientsrdquoin 2014 IEEE-EMBS International Conference on Biomedicaland Health Informatics (BHI) pp 240ndash243 IEEE Spain June2014
[21] X Xu T Jin and J Wang ldquoSummarizing patient dailyactivities for clinical pathway miningrdquo in 2016 IEEE 18thInternational Conference on e-Health Networking Applica-tions and Services (Healthcom) pp 1ndash6 IEEE GermanySeptember 2016
[22] J Chang S Gerrish C Wang J L Boyd-Graber andD M Blei ldquoReading tea leaves how humans interprettopic modelsrdquo Advances in Neural Information ProcessingSystems pp 288ndash296 2009
[23] M Danilevsky C Wang N Desai X Ren J Guo and J HanldquoAutomatic construction and ranking of topical keyphrases oncollections of short documentsrdquo in Proceedings of the 2014SIAM International Conference on Data Mining (SDMrsquo14)pp 398ndash406 Society for Industrial and Applied Mathematics2014
13Journal of Healthcare Engineering
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal of
Volume 201
Submit your manuscripts athttpswwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of