IBM Research
5/27/2007 | Information Flow Prediction and People Mining | Ching-Yung Lin © 2007 IBM Corporation
Information Flow Prediction and People Mining
Ching-Yung Lin
IBM T. J. Watson Research Center
May 27, 2007
2
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
10Gbit/s Continuous Feed Coming into System Types of Data
• Speech, text, moving images, still images, coded application data, machine-to-machine binary communication
System Mechanisms
• Telephony: 9.6Gbit/sec (including VoIP)
• Internet
Email: 250Mbit/sec (about 500 pieces per second)
Dynamic web pages: 50Mbit/sec
Instant Messaging: 200Kbit/sec
Static web pages: 100Kbit/sec
Transactional data: TBD
• TV: 40Mb/sec (equivalent to about 10 stations)
• Radio: 2Mb/sec (equivalent to about 20 stations)
Data Flow through an Internet Gateway..
3
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Network Monitoring and Stream Analysis
200-500MB/s ~100MB/sper PE rates
10 MB/s
InputsDataflow Graph
ip http
ntp
udp
tcp ftp
rtp
rtsp
sessvideo
sessaudio Interest Routing
keywords id
Packet content analysis
Advanced content analysis
Interest Filtering
Interested MM streams
By IBM Dense Information Gliding Team
4
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Borrow this from Hoover...
5
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
One of the issues – Speech Recognition, Speaker & Social Network Detection
Stream A
Stream B
Stream C
Stream D
Speaker Detection
Olivier Mihalis
Ching-Yung Upendra
talks to
talks to
Deepak
After denoising
- Social network- Fusion technique- Iterative method
Denoising & Social Network Analysis
What can be achieved by combining content analysis and social network analysis?
6
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Challenge – every node in the network is unique
Photo Source: New York Times, 3/2/2005
7
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Part I: Dynamic Probabilistic Complex Network and Information Flow
8
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
The Most Difficult Challenge: State-of-the-Arts?
Social Networks in sociological and statistic fields: focus on (1) overall network characteristics, (2) dynamic random graphs, (3) binary edges, etc. Not consider probabilistic nodes/edges or individual nodes/edges.
Epidemic Networks & Computer Virus Network: focus on (1) overall network characteristics – when will an outbreak occurs, (2) regular / random graphs. Not focus on individual nodes/edges.
(Computer) Communication Networks: focus on (1) packet transmission – information is not duplicated, or (2) broadcasting – not considering individual nodes/edges or complex network topology.
WWW: focus on (1) topology description, (2) binary edges and ranked nodes (e.g., Google PageRank) Not consider probabilistic edges
Our Objectives: Find important people, community structures, or information flow in a network, which is dynamic, probabilistic and complex, in order allocate resources in a large-scale mining system.
9
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
What is a Dynamic Probabilistic Complex Network?
10
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Modeling a Dynamic Probabilistic Complex Network[Assumption] A DPCN can be represented by a Dynamic Transition Matrix P(t), a
Dynamic Vertex Status Random Vector Q(t), and two dependency functions fM and gM.
, 1
, 2
,
Pr( ( ) )
Pr( ( ) )( ) ,
Pr( ( ) )E
i j
i j
i j
y t SE
y t SEt
y t SE
i,jp
where
( )ix t : the status value of vertex i at time t. and
1
2
Pr( ( ) )
Pr( ( ) )( ) ,
Pr( ( ) )V
i
i
i
x t SV
x t SVt
x t SV
iq
Pr( ( ) ) 1,
V
ix t SV
, ( )i jy t : the status value of edge i →j at time t.
,Pr( ( ) ) 1,E
i jy t SE
where
( ) ( ) ( )
( ) ( ) ( )
( ) ,
( ) ( ) ( )
t t t
t t t
t
t t t
1,1 2,1 N,1
1,2 2,2 N,2
1,N 2,N N,N
p p p
p p p
P
p p p
( )
( )
( ) ,
( )
t
t
t
t
1
2
N
q
q
Q
q
( ) ( ( ), ( )),Mt t f t tP Q P
( )
( ( ), ( ), ( )),M
t t
g t t t t
Q
P Q P
and
11
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Information Flow in Dynamic Probabilistic Complex Network (Let’s call it: Behavioral Information Flow (BIF) Model)
[Assumption] Edge can be represented by a four-state S-D-A-R (Susceptible-Dormant-Active-Removed) Markov Model. Nodes can be represented by three states S-A-I (Susceptible-Active-Informed) Model.
, ,
, ,
, ,
, ,
Pr( ( ) )
Pr( ( ) )( ) ,
Pr( ( ) )
Pr( ( ) )
i j i j
i j i j
i j i j
i j i j
y t S
y t Dt
y t A
y t R
i,jp
where
( ) ( ) ( )
( ) ( ) ( )
( ) ,
( ) ( ) ( )
t t t
t t t
t
t t t
1,1 2,1 N,1
1,2 2,2 N,2
1,N 2,N N,N
p p p
p p p
P
p p p
( )
( )
( ) ,
( )
t
t
t
t
1
2
N
q
q
Q
q
( )
( , ( ), ( )),
t t
f t t
P
M Q P
( )
( ( ), ( ), ( )),
t t
g t t t t
Q
P Q P
and
Pr( ( ) )
( ) Pr( ( ) ) ,
Pr( ( ) )
i i
i i
i i
x t S
t x t A
x t I
iq
, , , , 1i j i j i j i j 1i i i
12
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Major Difference between BIF and Prior Modeling Methods in Epidemic Research and Computer Virus Fields
Prior Models: Model Human Nodes as S-I-R (Susceptible, Infected, and Removed).
Did not consider individual node’s behavior different in network structure/topology did not consider edge status.
We propose to model edge status as (autonomous) S-D-A-R Markov Model (Susceptible, Dormant, Active, Removed)
We propose to model human node behavior as S-A-I (Susceptible, Active, and Informed).
13
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Edges are Markov State Machines, Nodes are not
State transitions of edges: S-D-A-R model. (Susceptible, Dormant, Active, and Removed) This indicates the time-aspect changes of the state of edges.
S A RD
1
trigger
1 1 1
States of nodes: S-A-I model. (Susceptible, Active, and Informed) Trigger occurs when the start node of the edge changes from state S to state I :
Node view Network view
Edge view
S Itrigger
A
14
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Edge State Probability and Network Configuration ModelNodes and Edges
( ) ( , ( ), ( )),t t f t t P M Q P
1,1 1,1 1,1 2,1 2,1 2,1 ,1 ,1 ,1
1,2 1,2 1,2 2,2 2,2 2,2 ,2 ,2 ,2
1, 1, 1, 2, 2, 2, , , ,
( , , ) ( , , ) ( , , )
( , , ) ( , , ) ( , , )
,
( , , ) ( , , ) ( , , )
N N N
N N N
N N N N N N N N N N N N
M
i,j = 0 No Edge between i and j Our KDD 2005 paper is a special case that i,j =1 or 0, and did not model (i,j ,i,j )
Network Configuration Model (which is learned by training). It includes the network topology information, long-term edge probability, and delay parameter).
15
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Define Edge State Probability Update Function
Given three different cases:1. On trigger:
2. No trigger – node not informed yet:
3. No trigger – node has been informed:
( ) ( , ( ), ( ))t t f t t P M Q P
, ,
, , , ,
, , , ,
, , , ,
0 0 0 0
1 0 0( ) ( ),
0 1 0
1 0 1
i j i j
i j i j i j i j
i j i j i j i j
i j i j i j i j
t t t
i,j i,jp F p
( ) , ( )i ix t t I x t I
( ) , ( )i ix t t I x t I
( ) ( ),t t t i,j i,jp p
( ) , ( )i ix t t I x t I
( ) ( ),t t t i,j i,jp F p
S A RD
trigger
1 1
Therefore, consider the probabilities of node states, then we get f(.):
( ) ( ) (1 ) ( )i it t t t i,j i,j i,jp F p p
Edge State Probability Update function f(.) s.t.:
16
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Nodes: State Transitions Determined by Incoming Edges
Node State Probability Update Function g(.): S Itrigger
A
( ) ( ( ), ( ), ( )),t t g t t t t Q P Q P
Network view
,
, ,
,
,
, , ,
, ,
(1 ) 0 0
( ) 1 (1 ) (1 ) 0 ( ),
0 1 (1 ) 1
V i
V i V i
V i
n in
i i
i n i n i n i in n
i i
n i n in
t t t
i iq Q q
where
,
, ,
, ,
Pr( {1 }, ( ) , ( ) )
1 (1 )
V i
n i n i
n i n in
n N y t t R y t A
and V,i is the set of all source nodes of the incoming edges of Node i: , ,{ | {1 }, 0}V i n in n N
17
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
An Application of Information Flow Prediction – find important people
Who are the most likely people to talk about this information at a specific time given the current observation?
For a given concrete observation, the values in the given priors are either 0 or 1.
For speaker recognition results, the priors can be confidence values between 0 ~ 1.
,, {1 }
( , ) arg max( ( ))m nm n N
m n t
given ( ( ), ( ))t tP Qor ( )tQ
( ), ( )t tP Q
18
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Case Study I – Switchboard data from 679 people
Monte Carlo Method: Simulate each DPCN information flow for 1000 times.
It takes 12 seconds to use MC simulation to predict the process. (For a given model and test all 679 nodes, it takes a PC 130 mins for calculate the probabilities if the information flow starts from different 679 seeds).
The Probabilities of the Nodes Receives Information
0
0.05
0.1
0.15
0.2
0.25
0.3
1 28
55
82
109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
514
541
568
595
622
649
676
SeedID100
19
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
The distribution histogram of the alpha values of the edges in the Enron dataset.
1
10
100
1000
10000
100000All Topics
Market Opportunity
California Market
North America Product
20
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Noise Factor I – Impact of Classification Error from Speaker Recognition
Assume the classification precision rate on the speaker (node) i is i, and the false alarm rate on the speaker i is φi.
Then the expected number of times that the node is counted is:
And the link is counted is:
Therefore,
If we assume a universal precision and false alarm rate at all speakers, then:
Assume the average waiting time of links and the average transmission duration of links are the same regardless of the links observed, then:
If we assume the false alarm rate is small and can be neglected when the number of nodes is large, then
2i iK K Z
i j i jL L Z
, 2i j i j
i ji i
L ZL
K ZK
2 2
, 2i ji i
L L Z
K ZK
, ,i j i j , ,i j i j and
, ,i j i j
K
Z
i Kφi 2Z
truth detected
21
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Speaker Recognition Accuracy can be Improved by Fusion of Original Speaker Recognition and Predicted Node Probability
We can use this fusion method to combine both speaker recognition result and the estimated node probability:
,
i ii
i i i k k
k
which is guaranteed to be increasing when i k
Before Fusion
After Fusion with BIF Prediction
Speaker iRecognizer
i , 1i k , 2i k , 3i k
Speaker iRecognizer
BIF Prediction
i ii
22
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Recognition Result from Switchboard-2 Telephone Conversation Set
Improvement on Recognition Accuracy on Node 171. The x-axis is the time that model is updated based on the recognition result after fusion. The y-axis represents the recognition accuracy. In the six testing cases, the Node 171 is usually confused with Node 218 or Node 164. In the first two cases, there are no false alarm from the classification of Node 218 or 164. In the next two cases, they are usually confused with each other. In the last two cases, the false alarm from Node 218 or 164 is 0.3.
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5
Node 218, no falsealarm
Node 164, no falsealarm
Node 218, mutuallyconfused
Node 164, mutuallyconfused
Node 218, prob. falsealarm = 0.3
Node 164, prob. falsealarm = 0.3
23
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Case Study (II) – our experiments on Enron Emails
24
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Modeling and Predicting Topic-Related Personal Information Flow
Given the sender
and the time of an email:1. Get the probability of a topic given the sender
2. Get the probability of the receiver given the sender and the topic
3. Get the probability of a word given the topic
Boxes represents iteration.
Content-Time-Relation Model Combine content, time and social relation information with Dirichlet allocations and a causal Bayesian network. [ Song et al., KDD, August 2005] (1st paper combining content analysis and social network analysis)
: observations
A
ND
T
ad
z
w
r
S
Tm
t
a: sender/author, z: topic, S: social network (Exponential Random Graph Model / p* model), D: document/emailr: receivers, w: content words, N: Word set, T: Topic
25
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Corporate Topic Trend Analysis Example: Yearly repeating events
Topic Trend Comparison
0
0.005
0.01
0.015
0.02
0.025
0.03
Jan Mar May Jul Sep Nov
Popula
rity
Topic45(y2000)
Topic45(y2001)Topic19(y2000)
Topic19(y2001)
Topic 45, which is talking about a schedule issue, reaches a peak during June to September. For topic 19, it is talking about a meeting issue. The trend repeats year to year.
26
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Topic Detection and Key People Detection of “California Power” Match Their Real-Life Roles
(a)
Topic Analysis for Topic 61
00.002
0.0040.0060.008
0.010.0120.014
0.0160.018
Jan-00 Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01
Popula
rity
Key Words power 0.089361 California 0.088160 electrical 0.087345 price 0.055940 energy 0.048817 generator 0.035345 market 0.033314 until 0.030681
Key PeopleJeff_Dasovich 0.249863 James_Steffes 0.139212Richard_Shapiro 0.096179 Mary_Hain 0.078131Richard_Sanders 0.052866 Steven_Kean 0.044745Vince_Kaminski 0.035953
Event “California Energy Crisis” occurred at exactly this time period. Key people are active in this event except Vince_Kaminski …
27
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Social Network of Enron ManagersIf we try to find out social networks based on all communications, it is
difficult.
28
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Information Flow in Enron – California MarketActor 151 (Rosalee Fleming — the Enron CEO Ken L.’s assistant) is
the key information spreader of this issue.
29
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Information Flow in Enron – Market OpportunitiesRosalee Fleming also played an important role at “Market Opportunities.” She received info
from Actor 119 (Mike Carson) and Actor 23 (James Steffes – VP of Gov. Affairs of Enron.)Actor 68 (Rod Hayslett -- CFO) is also a major information spreader.
30
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Information Flow in Enron – North American Products Two disjoint communities can be observed. Actor 21 (Keith Holst) and Actor
142 (Dan Hyvl) are the main bridges of the two communities.
31
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
This kind of analysis is wonderful, but..
We cannot wait until our company has scandle and bankrupts....
What kinds of applications can be valuable out of network analysis?
32
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Part II: Small Blue
33
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Social Network -- A key differentiator for corporate performance
Informal social network within formal organizations is a major factor affecting companies’ performance:
Krackhardt (CMU, 2005) showed that companies with strong informal networks perform five or six times better than those with weak networks.
Brydon (VisblePath, 2006) showed that the performance gain of companies utilizing social networks:
• 16x at sales
• 4x at marketing
• 10x at hiring
34
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
We hope social network and expertise mining can dramatically increase our colleagues’ knowledge and collaboration
35
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Social Networks -- Beyond the organizational chart
Source: Cross, R., Parker, A., Prusak, L. & Borgatti, S.P. 2001. Knowing What We Know: Supporting Knowledge Creation and Sharing in Social Networks. Organizational Dynamics 30(2): 100-120. [pdf]
Organization charts are not the best indicator of how work gets done
Senior people are not always central; peripheral people can represent untapped knowledge
Making the network visible makes it actionable and becomes the basis for a collaboration action plan
Provided by Drs. Tony Mobbs and Kate Ehrlich, IBM
36
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Group and Roles
Marketing
Finance
Manufacturing
Andy
Bob
Carl
Darren
Earl
Frank Indojit
Gerry Harry Jeff
Sam
Karen
Leo
Ming
Neo
Central people Sam. Could be bottleneck or
holding group together
Peripheral people Earl. Goes to others but no-
one goes to him for information. At risk for leaving. Potentially unrealized expertise
Sub-groups Group split by function. Very
little information shared across groups
This slide is excerpted from SNA Theory, Concepts and Practice by Dr. T. Mobbs, BCS and Dr. K. Ehrlick, Research
37
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Some Roles are especially critical
Marketing
Finance
Manufacturing
Andy
Bob
Carl
Darren
Earl
Frank Indojit
Gerry Harry Jeff
Karen
Leo
Ming
Neo
What happens if Sam leaves the group through layoffs, job reassignment, attrition, merger, retirement?
This slide is excerpted from SNA Theory, Concepts and Practice by Dr. T. Mobbs, BCS and Dr. K. Ehrlick, Research
38
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Relationships are multi-dimensional and (traditionally) uncovered through network questions
CommunicationHow often do you communicate with this person?
InnovationHow often do you turn to this person for new ideas
AdviceHow often do you seek advice from this person before making an important decision?
AwarenessI am aware of this person’s knowledge and skills
LearningHow likely are you to rely on this person for advice on new methods and processes
Valued ExpertiseHow likely are you to turn to this person for specialized expertise
TrustI believe there is a high personal cost in seeking advice or support from this person
AccessI believe this person will respond to my request in a reasonable and timely manner
EnergyI generally feel energized when I interact with this person
Actions Awareness Emotional
Provided by Drs. Tony Mobbs and Kate Ehrlich, IBM
39
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Personal Network preferred source for information and collaboration
• Under utilisation of electronic products and services.
• Content has lower performance impact / not realising full potential benefits.
• Widely inconsistent working practices.
Personal Network
W3 Stub W3 Stub/ client
W3 Stub/ Client
W3 Stub/ client
W3 Stub
PSN Methods Education CommunitiesOther w3content
KnowledgeView
W3 Stub
ProjectRepositories
client W3 Stub/ client
CollaborationProjectTools
client
Existing Resources Provided
?
GBS Practitioner with task in project / delivery environment
Standalone, disparate, poor integration, large number of sources, steep learning curve (identify, understand & synthesise into specific work context), difficult to locate, choose & use.
Preferred / primary mode
Forces: • Time Constrained• Delivery activity focus• What gets measured gets done• Expedience• Perceived value (return on time investment)
High reliance on:• 50% ~ 75%: Personal networks (Gartner Report,
2006)• Hard-drive materials• What has worked for them previously (personal
experience)
leads to
• fast turnaround of request• specific response• Small # relevant items returned• recommendation of quality• ability to quickly understand the
supplied resource & determine relevant parts
• additional context / value-add info not available in electronic materials
Who knows what? How to reach them? Who plays what hidden roles?
40
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Mining Expertise, Interests and Social Network
People can be “known” by: public resources:
• publications• personal webpages• blogs• presentations• wiki
organizational resources:
• patent applications• bluepages
personal resources:
• emails• instant messaging• meeting• phone calls• face-to-face interactions
Expertise can also be inferred by her friends’ recommendations or expertises.
private
public
timely &abundant
resources for
expertisemodeling
41
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
SmallBlue Find
SmallBlue Reach
SmallBlue Ego
SmallBlue Connect
SmallBlue Expand
SmallBlue Inference Engines
andServers
SmallBlue Clients(Distributed Automatic Social Sensors)
Private & Personalized
Public & Personalized
Public
External Data
BluepagesBlueGroupsCommunityMapBlogCentralIBM ForumKnowledgeViewSocial Bookmark
My friends’ social values to me
Evolution of my Ego net
My personal network (Ego net) inferred from my Notes emails in server/local/archive and SameTime chats
Inference of my understanding on my friends’ expertise
Corporate-wise ranked experts
Ranked experts in my extended personal network, in a business unit and/or in a country
Only Public Information is shown
My social paths to her: which friends can introduce her, which friends work with her, .. trust, awareness, collaboration.
Her public postings, profiles, and communities to judge whether she is the right person.
Who I may want to know..
Which communities I may want to join..
Which documents I may want to look at
how to reach a person
social network analysis of Top-K experts
SNA of a formal group, a bluegroup or a community
social network analysis of a list of people
Other IBMers’ EgoNets
Other IBMers’ Expertise Inferences
I cannot see their communications, EgoNets nor Expertise Inferences
social network info
user search experts or person
social network analysis (SNA): who are the key persons in this network? who are the major hubs? who are the major bridges?
42
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Major Use of SmallBlue Find
Find out who are the experts of any search terms. (Right now, zillions of possible terms.)
Rank them based on collaborative expert recommendationCan show experts based on:
whole corporate-wise
business unit
country
my personal proximity
43
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Collaborative Expert Recommendation
Combine everyone’s knowledge of the expertise of our colleagues.
The more recommendation from more colleagues, the higher the score.
The more recommendation from my trusted colleagues, the higher the score.
The higher recommendation score from colleagues, the higher the overall score.
Combining all IBMers’ knowledge, we can make an advanced expert finding search engine.
Utilizing the expert search engine, we can enhance all IBMers’ knowledge and social connections.
44
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
SmallBlue Reach Paths help users to reach another person
SmallBlue Reach Paths show the shortest paths for me to reach a person up to 6 degrees away.
SmallBlue Reach Paths can be initiated from any one of three SmallBlue applications.
Can be used for: Access -- knowing who can help introducing
me to this person.
Trust -- knowing who in my social networks knows this person.
Get Familiar with – knowing what kinds of people are contacting to this person.
Initiate Communication – who do we know in common.
45
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
SmallBlue EgoHow healthy is my personal social capital?
What is the social value of Alice to me?
What are the changes and trends of my social capital evolution? For instance, I have to talk to Alice soon. She is valuable to me in
terms of social connections and she is getting out of the Ego net circle..
46
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
SmallBlue Connect
Enterprise Social Network Analysis Tool
Showing Social Networks of people based on:
expertise key words
formal hierarchy
Any list of emails
Utilizing Social Network Analysis to show:
who are the important hubs among experts
who are the important bridges linking groups
47
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Privacy Consideration – Bottom Line
Employees’ communications (e.g., time, from, to, cc, subject, content of emails, SameTime, etc.) are NOT searched nor retrievable to anyone.
Employees’ knowledge of other employees are INFERRED. Only the aggregated inferred knowledge is searchable. It is NOT possible to guess which part of aggregated inferred knowledge is contributed by whom.
In the social network analysis graphs, people relationships are modeled by their multimodal generic relationships. NO clue for their communication content.
Only the employees’ outgoing emails & instant messages and the portion that was authored by the employee is utilized.
Anyone can suggest keywords not be searched, search terms that should not find him, or ask to remove from the system.
48
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Preliminary User Evaluation
Scores5 – very satisfied
5 4 3 2 1
Capability 24% 42% 17% 17% 0%
Usability 28% 33% 5% 25% 10%
Search 10% 43% 23% 22% 2%
Reliability 28% 38% 17% 12% 5%
Performance 15% 45% 25% 13% 3%
Privacy 29% 34% 34% 3% 0%
Personal Network
15% 50% 13% 23% 0%
Overall Satisfaction
17% 49% 17% 15% 2%
49
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Demo
50
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
Coincidence ??
SmallBlue Ego Trial Release (8/21)
SmallBlue Find and Connect
Trial Release (9/20)SmallBlue on TAP (11/07)
51
IBM Research
5/27/07| Information Flow and People Mining | Ching-Yung Lin, IBM T. J. Watson Research Center © 2007 IBM Corporation
AcknowledgementsThanks to the SmallBlue Team Members:
Vicky Griffits-Fisher, Kate Ehrlich, Christopher Desforges, Michael Ackerbaruer, Reynold Khachatourian, Irina Fedulova, Ekaterina Zaytseva, Jeffrey Borden, Jennifer Xu, Yi Gu, Jie Lu, Dima Rekesh Belle Tseng Xiaodan Song
Contact: Ching-Yung Lin ([email protected]) ( http://www.research.ibm.com/people/c/cylin )
Top Related