Inferring Community Structure in Healthcare Forums. An empirical study

13
Inferring Community Structure in Healthcare Forums An Empirical Study T. Chomutare; E. Årsand; L. Fernandez-Luque; J. Lauritzen; G. Hartvigsen Methods Inf Med 2013; 52: 160–167 doi: 10.3414/ME12-02-0003 Cite as: T. Chomutare, E. Arsand, L. Fernandez- Luque, J. Lauritzen, G. Hartvigsen Inferring community structure in healthcare forums. An empirical study. Methods Inf Med. 2013; 52(2): 160–167. Published online 2013 Feb 8. doi: 10.3414/ME12-02-0003. https://www.ncbi.nlm.nih.gov/pubmed/2339228 2

Transcript of Inferring Community Structure in Healthcare Forums. An empirical study

Page 1: Inferring Community Structure in Healthcare Forums. An empirical study

Inferring Community Structure in

Healthcare ForumsAn Empirical StudyT. Chomutare; E. Årsand; L. Fernandez-Luque; J. Lauritzen; G.

HartvigsenMethods Inf Med 2013; 52: 160–167 doi: 10.3414/ME12-02-0003Cite as:

T. Chomutare, E. Arsand, L. Fernandez-Luque, J. Lauritzen, G. HartvigsenInferring community structure in healthcare forums. An empirical study. Methods Inf Med. 2013; 52(2): 160–167. Published online 2013 Feb 8. doi: 10.3414/ME12-02-0003. https://www.ncbi.nlm.nih.gov/pubmed/23392282

Page 2: Inferring Community Structure in Healthcare Forums. An empirical study

Introduction

Ma X, Chen G, Xiao J. Analysis of an online health social network. In: Proceedings of the 1st ACM International Health Informatics Symposium; Arlington, Virginia, USA. 1883035: ACM; 2010. ppBurton S, Morris R, Dimond M, Hansen J, Giraud-Carrier C, West J, et al. Public health community mining in YouTube. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium; Miami, Florida, USA. 2110376: ACM;2012. pp 81–90.

In this work, it is examined whether interactions in healthcare forums can be represented as networks ; Diabetes taken as case study.

Patients must rely on peers for emotional, empathetic and practical support and Internet forums are one of the most popular social media for self-help.

Unlike most social networking websites, relationships in forums are implied. These relationships are encoded in large datasets of forum threads-and-comments dynamics, and in this work we use network analysis to decipher these relationships.An interesting application by Ma et al. analyzed a healthcare forum for weight changes and the influence it had on the weight of people in relationship circles, and Burton extended the analysis to video social network (YouTube) interactions for public health.

The specific aims of the study are to i) represent interactions in diabetes forums as networks, ii) detect and explore community structures in the networks, and iii) propose a quality measure characterizing the discovered communities.

Page 3: Inferring Community Structure in Healthcare Forums. An empirical study

Methods

Acquiring datasets of

forum activity Cleaning the

data Modeling a

network Detect

communities

Assess their Structure using

Quality Measures

Procedure for the study:

It is proposed to use two well-known community detection algorithms that deal very well with large datasets; Greedy Optimization(GO) and the Affinity Propagation(AP) algorithms.

GO is based on the Girvan-Newman algorithm; hierarchical agglomeration. The algorithm is xtremely fast and suitable for large networks

1 The AP algorithm is based on messagepassing, where an initial data set is chosen at random and then refined in iterations.

2

Page 4: Inferring Community Structure in Healthcare Forums. An empirical study

• Web data extraction program was developed in Python to crawl five diabetes forums on the Internet, comprising Spanish and English forums, and two of them were dedicated to juvenile diabetes.

• The forums have a total in excess of 140,000 registered users and over 1.6 million posts.

• An edge is created between a user(node) who creates a topic (edge) and a user who comments on the topic.

• As an illustration, from Figure 1, we can observe how a typical forum network could develop over time.

1

2

3

Figure 1 A network of thread creation and comments is developing over time, where at time to (a) the initial network is established, and it develops progressively through time T1 (b) TO A Blossoming cluster at time T2 (c).

Data Acquisition and Network Modeling

Page 5: Inferring Community Structure in Healthcare Forums. An empirical study

F1 F2 F3 F4 F5

Language English English English Spanish English

Target group Open Open Juvenile Juvenile OpenRegistered users 35589 72338 10527 1681 21968

Year from diagnosis 29183(82%) 5787(8%) x x 6590(30%)

Diabetes type 28471(80%) 54977(76%) x x 6590(30%)

Gender 18506(52%) 15191(21%) x x x

Age 1779(5%) 7234(10%) x x

HbA1c 1068(3%) 3617(5%) x x x

Location 3559(10%) 7234(10%) 5579(53%) 740(44%) 16256(74%)

• To analyze the network structure and topology, well-known community detection algorithms was used.

• Structure variables were observed such as the diameter and characteristic path length at the network level and centrality measures at the node level.

• As a guideline, Table 1 provided below shows the attributes that we managed to collect by crawling the forums and user profiles for publicly disclosed information.

Table 1 The percentage of users who disclosed personal data in the forums, where x = data either unavailable or could not extracted

Newman ME. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America 2006; 103 (23): 8577–8582. Epub 2006/05/26.Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge University Press; 1994.

Community Detection and Attribute Analysis

Page 6: Inferring Community Structure in Healthcare Forums. An empirical study

• In this study, we relied on the modularity to validate the structure of the networks. • We propose a quality function based on the homophily concept, where we assume the general hypothesis that

similar users tend to cluster together.

The three steps in the quality assessment procedure:

Construct a set of neighbors from the top community.

Construct a set of neighbors

from user interaction similarity analysis

Compare the neighbor sets.

McPherson M, Lovin L, Cook J. Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology 2001; 27 (1): 415–444.

Quality Assessment Heuristics

Page 7: Inferring Community Structure in Healthcare Forums. An empirical study

ResultsControl

(12) F1 F2 F3 F4 F5

Number of nodes 7569 9679 16404 5553 470 2948

Number of edges 103592 109695 467677 767413 42924 75027

Users who posted (%) - 27 23 53 28 13

Clustering coefficient 0.173 0.181 0.297 0.232 0.154 0.233

Network diameter 12 11 9 9 8 13

Characteristic path length 3.320 3.6 3.3 2.9 3.2 3.9

Average no.of neighbors _ 13 19 62 10 12

Number of clusters(AP/GO) _ 360(171) 2600(242) 1290(115) 193(23) 670(22)

Average clusters size (AP/GO) _ 27(57) 6(68) 4(48) 2(20) 4(134)

Maximum cluster size(AP/GO) _ 1976(391

1)2163(8450

) 470(2597) 239(131) 1351(1372)

Modularity(AP/GO) _ 0.07(0.34) 0.14(0.35) 0.02(0.28) 0.40(0.28) 0.35(0.35)

3.1 Nature of the Diabetes Networks

• Table 2 show the basic characteristics of the five forums we studied, and reveal several interesting points.

• Community nodes connect mostly with the central nodes, resulting in the distinctly dominant star topology with maximum centralization

• The average number of neighbors is not as high as could be expected from conventional types of social networks

• Very low user participation rates in all the forums, suggesting high levels of activity among the few users who do participate actively.

Table 2 Basic network characteristics from the 5 datasets and the community detection results. AP = Affinity propagation algoritham, and GO = Greedy optimization algoritham.the control column contains data from a previously published similar (12)

Page 8: Inferring Community Structure in Healthcare Forums. An empirical study

• Figure 3 shows the clustering coefficient distribution for forums F1 and F2.• Figure 4 Visualizes the community structures & illustrates the dominant star topology.

Figure 4 is a zoomed –in figure of some of forum f1 detected communities based on the greedy optimization algorithm. The node size is related to the node’s in-degree, and the colors are: blue= no data provided by the, green=0-1 year after diagnosis, red=2-10 years diagnosis, and black= more than 10 years after diagnosis.

Figure 3 The clustering coefficient distribution for two forums f1 (3a) and f2 (3b). The bold line in fig. 3b is the Power-law distribution (y=axb) with a correlation of 0.863 to the network’s distribution

Community Structure and Visualization

Page 9: Inferring Community Structure in Healthcare Forums. An empirical study

• Figure 5 shows how easy it is to isolate users who may be struggling with the disease. • A strong edge between a node with high in-degree and another with low in-degree means the latter initiates most

ties with the former, suggesting a needy user or someone highly motivated and involved in selfcare.• A central node with very low in-degree suggests a very hyperactive user who initiates ties with several users.

Figure 5 A zoomed – in figure of some of forum F3 detected communities based on the Greedy optimization algorithm. The node size is based on the node’s In-degree. The edges are weighted and the bold edge represents a high weight value.

Community Structure and Visualization

Page 10: Inferring Community Structure in Healthcare Forums. An empirical study

• The degree-based homophily measures (measured between –1 and +1) are negative for all the networks, suggesting dissortativity.

• Results suggest node degree is not a significant factor for cohesion in diabetes forums.• Homophily based on degree seems inadequate to describe the nature of the studied diabetes social networks, and

the proposed new measure that is based on forum interaction and activity could be complementary.

TC (CB) F1 F2 F3 F4 F5

Greedy Optimization (user interaction) 0.6 (0.8) 0.6(0.9) 0.5(0.7) 0.6(0.7) 0.7(0.9)

Affinity propagation (user interaction) 0.5 (0.6) 0.5(0.7) 0.5(0.6) 0.4(0.6) 0.7(0.8)

Homophily (degree) -0.16 -0.17 -0.18 -0.14 -0.20

Homophily (year-since-diagnosis) 0.20 - - - -

Homophily (diabetes-type) 0.42 0.35 - - -

Quality of the Discovered Communities

Table 3 The results of the quality function (measured between 0 and + 1 ) on the 5 forums, using the two community detection algorithms and two similarity measures. TC = Tanimoto coefficient similarity and CB = City Block similarity, as well as homophily measures based on several node attributes

Page 11: Inferring Community Structure in Healthcare Forums. An empirical study

There might have been some private dialogue that could alter the network

density

1The unavailability of

additional user attribute data which limited analysis to only two attributes that had sufficient

data.

2Some forum data may be phony or

forged, and missing or erroneous data has been shown to

affect critical measures

3

Limitations

Page 12: Inferring Community Structure in Healthcare Forums. An empirical study

• This study has shown how implicit networks can be modeled from forum interaction.

• Further, we showed that the resulting networks actually do make sense and correlate with some attribute-based homophily measures.

• Current empirical observations strengthen the basic science that supports future analysis and reasoning about diabetes patients as users of the medical web.

Conclusions

Page 13: Inferring Community Structure in Healthcare Forums. An empirical study

Read the full paper

Cite as: T. Chomutare, E. Arsand, L. Fernandez-Luque, J. Lauritzen, G. HartvigsenInferring community structure in healthcare forums. An empirical study. Methods Inf Med. 2013; 52(2): 160–167. Published online 2013 Feb 8. doi: 10.3414/ME12-02-0003. https://www.ncbi.nlm.nih.gov/pubmed/23392282

https://www.shutterstock.com/pic-342415589/stock-photo-doctor-meeting-teamwork-diagnosis-healthcare-concept.html

https://www.shutterstock.com/pic-461518708/stock-photo-health-wellbeing-wellness-vitality-healthcare-concept.html

https://www.shutterstock.com/pic-354007253/stock-photo-business-team-concept-inflation.html

Image credits