Post on 23-Sep-2020
1
Construction and Mining of Heterogeneous Information Networks: Will It Be a Key to
Web-Aged Information Management and Mining
Jiawei Han Department of Computer Science
University of Illinois at Urbana-Champaign Collaborated with many students, especially Yizhou Sun, Ming Ji, Chi Wang, Ahmed El-Kishky, Bo Zhao
Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), ARO, Microsoft, IBM, Yahoo!, LinkedIn, Google & Boeing
July 3, 2014
2
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
Where There Is Information, There Are Networks!
Social Networking Websites Biological Network: Protein Interaction
Research Collaboration Network Product Recommendation Network via Emails 3
The Real World: Heterogeneous Networks Multiple object types and/or multiple link types
Venue Paper Author DBLP Bibliographic Network The IMDB Movie Network
Actor Movie
Director
Movie Studio
Homogeneous networks are information loss projection of heterogeneous networks!
The Facebook Network
Directly mining information-richer heterogeneous networks 4
Structured Heterogeneous Network Modeling Leads to the New Power of Data Mining! DBLP: A Computer Science bibliographic database
Knowledge hidden in DBLP Network Mining Functions How are CS research areas structured? Clustering
Who are the leading researchers on Web search? Ranking
What are the most essential terms, venues, authors in AI? Classification + Ranking
Who are the peer researchers of Jure Leskovec? Similarity Search
Whom will Christos Faloutsos collaborate with? Relationship Prediction
Which types of relationships are most influential for an author to decide her topics?
Relation Strength Learning
How was the field of Data Mining emerged or evolving? Network Evolution
Which authors are rather different from his/her peers in IR? Outlier/anomaly detection
A sample publication record in DBLP (>2 M papers, >0.7 M authors, >10 K venues), …
5
Power of het. network modeling: Treat Author, Venue, Term & Paper all first-class citizens!
6
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
7
Basic functions for networks: Clustering, classification & ranking Why not integrate ranking with clustering & classification?
High-ranked objects should be more important (thus more weighted) in a cluster/class than low-ranked ones
Ranking will make more sense within a particular cluster/class Einstein in physics vs. Turing in Coputer Science
Integrate ranking with clustering/classification in the same process Ranking, as the feature, is conditional (i.e., relative) to a specific
cluster/class Ranking and clustering/classification mutually enhance each other Ranking-based clustering: RankClus [EDBT’09], NetClus [KDD’09],
PathSelClus [KDD’12] Ranking-based classification: GNetMine (for classification)
[ECMLPKDD’10], RankClass [KDD’11]
Ranking-Based Clustering & Classification
Initialization Randomly partition
Repeat Ranking
Ranking objects in each sub-network induced from each cluster
Sub-Network Ranking
Clustering
8
Generating new measure space Estimate mixture model coefficients
for each target object Adjusting cluster
Until stable
RankClus: Integrating Clustering with Ranking
An E-M framework for iterative enhancement
Difference on ranking rules: 1. Simple ranking rule: Ranking high if more papers 2. Highly ranked if publishing more in highly ranked-
venues
9
Step-by-Step Running of RankClus
Initially, ranking distributions are mixed together
Two clusters of objects mixed together, but
preserve similarity somehow
Improved a little Two clusters are
almost well separated
Improved significantly
Stable
Well separated
Clustering and ranking two fields: DB/DM & HW/CA
10
NetClus: Ranking-Based Clustering with Star Network Schema
Beyond bi-typed network: capture more semantics with multiple types Split a network into multi-subnets, each a net-cluster [KDD’09]
Research Paper
Term
AuthorVenue
Publish Write
Contain
P
T
AV
P
T
AV
……
P
T
AVNetClus
Computer Science
Database
Hardware
Theory
DBLP network: using terms, venues & authors to jointly distinguish a sub-field, e.g., database
11
NetClus: Database System Cluster
database 0.0995511 databases 0.0708818
system 0.0678563 data 0.0214893
query 0.0133316 systems 0.0110413 queries 0.0090603
management 0.00850744 object 0.00837766
relational 0.0081175 processing 0.00745875
based 0.00736599 distributed 0.0068367
xml 0.00664958 oriented 0.00589557 design 0.00527672 web 0.00509167
information 0.0050518 model 0.00499396
efficient 0.00465707
Surajit Chaudhuri 0.00678065 Michael Stonebraker 0.00616469
Michael J. Carey 0.00545769 C. Mohan 0.00528346
David J. DeWitt 0.00491615 Hector Garcia-Molina 0.00453497
H. V. Jagadish 0.00434289 David B. Lomet 0.00397865
Raghu Ramakrishnan 0.0039278 Philip A. Bernstein 0.00376314
Joseph M. Hellerstein 0.00372064 Jeffrey F. Naughton 0.00363698 Yannis E. Ioannidis 0.00359853
Jennifer Widom 0.00351929 Per-Ake Larson 0.00334911 Rakesh Agrawal 0.00328274
Dan Suciu 0.00309047 Michael J. Franklin 0.00304099 Umeshwar Dayal 0.00290143
Abraham Silberschatz 0.00278185
VLDB 0.318495 SIGMOD Conf. 0.313903
ICDE 0.188746 PODS 0.107943 EDBT 0.0436849
One-level deeper: Authors in XML, Xquery cluster
Term Venue Author
Rank-Based Clustering: Works in Multiple Domains
12
RankCompete: Organize your photo album automatically!
MedRank: Rank treatments for AIDS from Medline
Use multi-typed image features to build up heterog. networks
Explore multiple types in a star schema network
13
Classification: Knowledge Propagation Across Heterogeneous Typed Networks
Labeled knowledge propagates through multi-typed objects across heterogeneous networks [ECMLPKDD'10, KDD’11]
RankClass: Ranking-Based Classification Class: a group of multi-typed nodes sharing the same topic + within-class ranking
node class
The RankClass Framework
14
Methodology of RankClass Classification in heterogeneous networks
Knowledge propagation: Class label knowledge propagated across multi-typed objects through a heterogeneous network
GNetMine [Ji et al., PKDD’10]: Objects are treated equally
RankClass [Ji et al., KDD’11]: Ranking-based classification
Highly ranked objects will play more role in classification
An object can only be ranked high in some focused classes
Class membership and ranking are stat. distributions
Let ranking and classification mutually enhance each other!
Output: Classification results + ranking list of objects within each class
15
16
Experiments on DBLP Class: Four research areas (communities)
Database, data mining, AI, information retrieval Four types of objects
Paper (14376), Conf. (20), Author (14475), Term (8920) Three types of relations
Paper-conf., paper-author, paper-term Algorithms for comparison
Learning with Local and Global Consistency (LLGC) [Zhou et al. NIPS 2003] – also the homogeneous version of our method
Weighted-vote Relational Neighbor classifier (wvRN) [Macskassy et al. JMLR 2007]
Network-only Link-based Classification (nLB) [Lu et al. ICML 2003, Macskassy et al. JMLR 2007]
Comparing Classification Accuracy on the DBLP Data
17
Object Ranking in Each Class: Experiment
DBLP: 4-fields data set (DB, DM, AI, IR) forming a heterog. info. network Rank objects within each class (with extremely limited label information) Obtain high classification accuracy and excellent ranking within each class
Database Data Mining AI IR
Top-5 ranked conferences
VLDB KDD IJCAI SIGIR
SIGMOD SDM AAAI ECIR
ICDE ICDM ICML CIKM
PODS PKDD CVPR WWW
EDBT PAKDD ECML WSDM
Top-5 ranked terms
data mining learning retrieval
database data knowledge information
query clustering reasoning web
system classification logic search
xml frequent cognition text
18
19
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
Similarity Search: Find Similar Objects in Networks
Who are most similar to Christos Faloutsos? Meta-Path: Meta-level description of a path between
two objects
Christos’s students or close collaborators Similar reputation at similar venues
Meta-Path: Author-Paper-Author (APA) Meta-Path: Author-Paper-Venue-Paper-Author (APVPA)
20
Schema of the DBLP Network
Different meta-paths lead to very different results!
Different meta-paths carry rather different semantics
PathSim: A Het. Network Similarity Measure
Popular object similarity measures in networks Random walk (RW) (or Personalized PageRank): Favors highly
visible objects (i.e., objects with large degrees) Pairwise random walk (PRW) (or SimRank): Favors “pure”
objects (i.e., objects with highly skewed distribution in their in-links or out-links)
PathSim [VLDB’11] Favor “peers”: objects with strong connectivity and similar
visibility under the given meta-path
x y
21
Note: P-PageRank and SimRank do not distinguish object type nor relationship type
Which Similarity Measure Is Better?
Anhai Doan CS, Wisconsin Database area PhD: 2002
Meta-Path: Author-Paper-Venue-Paper-Author (APVPA)
• Jignesh Patel
• CS, Wisconsin
• Database area
• PhD: 1998
• Amol Deshpande
• CS, Maryland
• Database area
• PhD: 2004
• Jun Yang
• CS, Duke
• Database area
• PhD: 2001
22
23
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
Meta path-guided prediction of links and relationships
Philosophy: Meta path relationships among similar typed links share similar semantics and are comparable and inferable
24
paper topic
venue
author
publish publish-1
mention-1
mention write write-1
contain/contain-1 cite/cite-1
Bibliographic network: Co-author prediction (A—P—A) Use topological features encoded by meta paths, e.g.,
citation relations between authors (A—P→P—A)
PathPredict: Meta-Path Based Relationship Prediction
vs.
Meta-Path Based Co-authorship Prediction
Co-authorship prediction problem Whether two authors are going to collaborate for the first time
Co-authorship encoded in meta-path Author-Paper-Author
Topological features encoded in meta-paths
Meta-paths between authors under length 4
Meta-Path Semantic Meaning
25
The Power of PathPredict: Experiments
Explain the prediction power of each meta-path Wald Test for logistic
regression Higher prediction accuracy
than using projected homogeneous network 11% higher in
prediction accuracy
26
Co-author prediction for Jian Pei: Only 42 among 4809 candidates are true first-time co-authors! (Feature collected in [1996, 2002]; Test period in [2003,2009])
Enhancing the Power of Recommender Systems by Heterog. Info. Network Analysis
Avatar Titanic Aliens Revolutionary Road
James Cameron
Kate Winslet
Leonardo Dicaprio
Zoe Saldana
Adventure Romance
Collaborative filtering methods suffer from the data sparsity issue
# of users or items
A small set of users & items have a large number of ratings
Most users and items have a small number of ratings
# of
ratin
gs
Heterogeneous relationships complement each other Users and items with limited feedback can be connected to the
network by different types of paths Connect new users or items in the information network
Different users may require different models: Relationship heterogeneity makes personalized recommendation models easier to define
27
Personalized recommendation with heterog. Networks [WSDM’14]
Performance Study: Personalized Recommendation in Heterogeneous Networks
Winner: HeteRec personalized recommendation (HeteRec-p)
Datasets:
Methods to be compared: Popularity: Recommend the most popular items to users Co-click: Conditional probabilities between items NMF: Non-negative matrix factorization on user feedback Hybrid-SVM: Use Rank-SVM with plain features (utilize both user
feedback and information network)
28
29
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
Automated Construction of Heterogeneous Info. Networks
Rich semantics can be mined from structured heterog. networks
Unfortunately, much of the real world data is unstructured
News, Wikipedia, blogs, tweets, multimedia data, …
Too expensive to be structured by human: Automated & scalable
How to turn unstructured data to structured heterog. Networks?
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Methodology: Incrementally & progressively explore links and structures to construct quality heterogeneous networks
30
From Unstructured Text to Structured Hierarchies
Kert [Danilevsky et al, SDM’14]: LDA + Keyphrase extraction and ranking by topic (4 criteria: coverage, purity, phraseness, completeness)
CATHTY [Wang, et al., KDD’13]: Recursive generation of topic hierarchy based on term co-occurrence network
CATHYHIN [Wang, et al., ICDM’13]: Use hetero. network to help topic hierarchy generation
TopMine [El-Kishky et al., 2014]: Exploring frequent contiguous pattern mining
31
1. Hierarchical topic discovery with entities
2. Phrase mining
3. Rank phrases & entities per topic
Output hierarchy w/ phrases & entities
Input collection
text o
o/1
o/1/1 o/1/2
o/2
o/2/1
Entity
Hierarchy so generated can be used for network mining, exploration, etc.
Our recent effort:
Automatic Discovery of Clusters and Term Hierarchies from Collection of Documents
KERT: Keyphrase Extraction and Ranking by Topic [SDM’14] 1. Use background LDA (bLDA) to cluster unigrams into T topics + one background topic 2. Extract candidate keyphrases for each foreground topic according to the word topic assignment — should be order-free and non-consecutive
E.g.,‘mining frequent patterns’ vs.‘frequent pattern mining’ E.g., ‘mining top-k frequent closed patterns’
3. Rank the keyphrases in each foreground topic Coverage: information retrieval’ vs. ‘cross-language information
retrieval Purity: only frequent in documents about topic t Phraseness: ‘active learning’ vs.‘learning classification’ Completeness: ‘vector machine’ vs. ‘support vector machine’
32
Experiment: Machine Learning: Top Keyphrases
– Dataset: DBLP paper titles – KERT variations: ignore each criterion in turn – Baselines: two variations of an approach from a recent work
33
CATHYHIN: Topic Hierarchy Construction by Integration of Heterogeneous Info. Networks
Using DBLP heterog. Info. network to enhance topical hierarchy generation
Hierarchies generated not only on topical phrases but also on authors & venues
34
TopMine: Phrase Mining Then Top Modeling
Frequent Contiguous Phrase Mining Frequency: A low frequency phrase is likely unimportant --- a
generalization of the most likely unigrams in LDA Collocation: The co-occurrence of tokens in such frequency
that is significantly higher than expected due to chance. Completeness: Do not double count sub-phrases (e.g. support
vector is not a phrase when inside support vector machine) General frequent pattern mining may place words far apart in the
same phrase The pattern for a phrase: each phrase is a contiguous sequence of
ordered words. Markov Blanket Feature Selection for Support Vector Machines
35
Good Phrases Should Be Stat. Significant
36
Under the assumption that the text data was generated by a sequence of Bernoulli trials, we can derive a significance measure for potential mergings.
Good Phrases!
Bag-of-Phrase Construction: An Example
37
The ToPMine Framework
38
1. Preprocessing: Perform stemming to reduce vocabulary size and increase collisions of phrases, split on phrase-invariant punctuation
2. Frequent Contiguous Pattern Mining: Collects aggregate counts and candidate phrases
3. Phrase Construction: Agglomerative construction of significant phrases (addresses frequency, collocation, and completeness)
4. Use the phrase construction of “bag of phrases”: This can be considered an enriched input for later algorithms (not exclusive to topic modeling)
5. Bag-of-phrases Topic Modeling
Experiments on AP News
39
Experiments on Yelp Reviews
40
41
Scalability: Runtime Results
42
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
43
Role Discovery: Mining Advisor-Advisee Relationships in DBLP Network
Propagation of simple, commonly accepted constraints in Time-Constrained Probabilistic Factor Graph (TPFG) “Advisor has more publications and longer history than advisee at the time
of advising” “Once an advisee becomes advisor, s/he will not become advisee again”
Smithth
2000
2000
2001
2002
2003
22000000000000000000
1999
Ada Bob
Jerry
Ying
Input: Temporalcollaborationnetwork
Output: Relationshipanalysis
(0.8, [1999,2000])
(0.7,[2000, 2001])
(0.65, [2002, 2004])
2004
Ada
Bob
Ying
Smith
(0.2,[2001, 2003])
(0.5, [/, 2000])
(0.9, [/, 1998])
(0.4,[/, 1998])
(0.49,[/, 1999])
Visualizedchorological hierarchies
Jerry
44
Role Discovery: Performance & Case Study
Datasets RULE SVM IndMAX TPFG
TEST1 69.9% 73.4% 75.2% 78.9% 80.2% 84.4%
TEST2 69.8% 74.6% 74.6% 79.0% 81.5% 84.3%
TEST3 80.6% 86.7% 83.1% 90.9% 88.8% 91.3%
Empirical parameter
optimized parameter
heuristics Supervised learning
DBLP data: 654, 628 authors, 1076,946 publications, years provided Labeled data: MathGealogy Project; AI Gealogy Project; Homepage
Advisee Top Ranked Advisor Time Note
David M. Blei 1. Michael I. Jordan 01-03 PhD advisor, 2004 grad
2. John D. Lafferty 05-06 Postdoc, 2006
Hong Cheng 1. Qiang Yang 02-03 MS advisor, 2003
2. Jiawei Han 04-08 PhD advisor, 2008
Sergey Brin 1. Rajeev Motawani 97-98 “Unofficial advisor”
Case study
45
Hierarchical Relationship Discovery From partially ordered objects to hierarchy (tree) Based on NLP or other techniques to extract
partially ordered objects Using constraints to discover relationships
Type Cognitive description Potential definition Homophile Parent and child are similar Polarity Parent is superior to child Support pattern
Patterns frequently occurring with child-parent pairs
Forbidden pattern
Patterns rarely occurring with child-parent pairs
Singleton Potential
Type Cognitive description Potential definition Attribute augment
Use inherited attributes from parents or children
Label propagate
Similar nodes share similar parents (or children)
Reciprocity Patterns altering in child-parent & parent-child pairs
Constraints Restrict certain patterns
Pairwise Potential Function: Cases
Discovery of the Kenny Family Tree
46
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
Truth Analysis: Enhancing the Quality of Heterogeneous Info. Networks
Info. networks could be untrustworthy, error-prone, missing, … TruthFinder [KDD’07]: Inference on trustworthiness by
constructing heterogeneous info. networks
w1 f1
f2 w2
w3
w4 f4
Web sites (Book Sellers)
Facts (Book Authors)
o1
o2
Objects (Books)
f3
True facts and trustable websites mutually enhance each other and will become apparent after some iterations
47
Truth Discovery: Multiple Truth Value and Handling False Negatives
• Voting may not always work well: Some sources tend to miss true values (False Negatives), while some others tend to produce false claims (False Positives)
• Why Latent Truth Model (LTM)? Modeling two-sided quality to support multiple true values per entity for truth finding [VLDB’12]
IMDB
Negative Claim
Positive Claim
Generating Implicit Negative Claims:
Harry Potter
Netflix
BadSource
Correct Claim
Incorrect Claim
High Precision, High Recall
High Precision, Low Recall
Low Precision, Low Recall
48
Model source quality in other data integration tasks, e.g. entity resolution. Trustworthiness in multi-genre networks (text-rich networks, social networks, etc.)
Truth Discovery: Effectiveness of Latent Truth Model
Experimental datasets: Large and real Book Authors from abebooks.com (1263 books, 879 sources, 48153 claims, 2420 book-author, 100 labeled) Movie Directors from Bing (15073 movies, 12 sources, 108873 claims, 33526 movie-director, 100 labeled) Effectiveness of Latent Truth Model:
49
50
Outline Why Heterogeneous Information Networks?
On the Power of Mining Structured Heterogeneous Info. Networks
Clustering & Classification: A Ranking-Based Approach
Similarity Search & Meta Path in Heterogeneous Info. Net
Prediction & Recommendation: A Meta Path-Based Approach
From Unstructured Data to Structured Networks
From Unstructured Text to Structured Hierarchies
Type Extraction and Role Discovery: Enrich Network Structures
Truth Analysis: Enhance Quality of Structured Network
Looking Forward to the Future
On-Going and Future Work From small, structured data to real, big, unstructured data
From the DBLP network to networks constructed from news, PubMed, tweets, …
Construction of quality, semi-structured heterogeneous networks from massive, real data sets
Integration information network with data/text cube and OLAP technologies Data/text cubes have been shown its power in multi-
dimensional OLAP analysis, e.g., EventCube
Integration of heterogeneous network mining ⇒ OLAP mining on multi-dimensional social and information networks
Network exploration of mission- or query-based hidden networks
51
From Data Mining to Mining Info. Networks
52
Han, Kamber and Pei, Data Mining, 3rd ed. 2011
Yu, Han and Faloutsos (eds.), Link Mining, 2010
Sun and Han, Mining Heterogeneous Information Networks, 2012
53
Conclusions Heterogeneous information networks are ubiquitous
Most datasets can be “organized” or “transformed” into “structured” multi-typed heterogeneous info. networks
Examples: DBLP, IMDB, Flickr, Google News, Wikipedia, … Surprisingly rich knowledge can be mined from structured
heterogeneous info. networks Clustering, ranking, classification, path prediction, ……
Knowledge is power, but knowledge is hidden in massive, but “relatively structured” nodes and links!
Key issue: Construction of trusted, semi-structured heterogeneous networks from unstructured data
From data to knowledge: Much more to be explored but heterogeneous network mining has shown high promise!
References M. Ji, J. Han, and M. Danilevsky, "Ranking-Based Classification of Heterogeneous Information
Networks", KDD'11. Y. Sun and J. Han, Mining Heterogeneous Information Networks: Principles and Methodologies,
Morgan & Claypool Publishers, 2012 Y. Sun, J. Han, et al., "RankClus: Integrating Clustering with Ranking for Heterogeneous
Information Network Analysis", EDBT’09 Y. Sun, Y. Yu, and J. Han, "Ranking-Based Clustering of Heterogeneous Information Networks with
Star Network Schema", KDD’09 Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, “PathSim: Meta Path-Based Top-K Similarity Search in
Heterogeneous Information Networks”, VLDB'11 Y. Sun, R. Barber, M. Gupta, C. Aggarwal and J. Han, "Co-Author Relationship Prediction in
Heterogeneous Bibliographic Networks", ASONAM'11 Y. Sun, J. Han, C. C. Aggarwal, N. Chawla, “When Will It Happen? Relationship Prediction in
Heterogeneous Information Networks”, WSDM'12 F. Tao, et al., “EventCube: Multi-Dimensional Search and Mining of Structured and Text Data”,
(system demo) KDD’13 C. Wang, J. Han, et al., “Mining Advisor-Advisee Relationships from Research Publication
Networks", KDD'10 C. Wang, M. Danilevsky, et al., “A Phrase Mining Framework for Recursive Construction of a
Topical Hierarchy”, KDD’13 C. Wang, Marina Danilevsky, et al., "Constructing Topical Hierarchies in Heterogeneous
Information Networks", ICDM'13 54