Near real-time recommendations in enterprise social networks
-
Upload
mdabrowski -
Category
Technology
-
view
596 -
download
0
description
Transcript of Near real-time recommendations in enterprise social networks
#AICRECSYS
ADVANsse Advances in social semantic enterprise
HTTP://ADVANSSE.DERI.IE/
MACIEJ DABROWSKI BENJAMIN HEITMANN CONOR HAYES KEITH GRIFFIN
10TH JULY 2013
About me MACIEJ DABROWSKI!
lecturerAt
co-PI
contact
co-PI
worksWith
researcherAtgraduated
name
Overview
THIS TALK
RESEARCH
INDUSTRY
1. WHY?
2. WHAT?
3. HOW?
4. TECHNICAL DECISIONS
5. LESSONS LEARNED
Why? What? How?
technical considerations
lessons learned
Various information domains
preferencesrecommendations
implicitconnections
User profile
TRAVEL
FOOD SPORTS
POLITICS ??
Use Case: Enterprise Social Web
Enterprise social web ENTERPRISE INFORMATION SPACE
MARKETING
DEVELOPMENT
R & D
ANDREW
BOB
CECILIA
DANNY
Limited information flow
MARKETING
DEVELOPMENT
R & D
GREAT TOOL!"
MEETING IBM"
TALK BY DERI"
ANDREW
BOB
CECILIA
DANNY
ENTERPRISE INFORMATION SPACE
Disconnected Social Networks
?
ANDREW
BOB
CECILIA
DANNY
MARKETING DEVELOPMENT
R & D
Distributed Social Platforms
?
MARKETING
DEVELOPMENT
R & D
Problem 1: information overload and discovery
Problem 2: data level issues
DISTRIBUTION
MULTIPLE DOMAINS AND TYPES OF ENTITIES
PEOPLE INTERESTS
CONTENT
Requirements - personalization
USE BACKGROUND KNOWLEDGE
ALLOW CROSS-DOMAIN MULTI-SOURCE PERSONALIZATION
EXPLOIT SOCIAL GRAPH
ALLOW REAL-TIME APPLICATIONS
Requirements - data
DATA LEVEL • FLEXIBLE • COMPACT • ENABLE CRUD • GRAPH?
TRANSPORT PROTOCOL: • RELIABLE • EFFICIENT • PUBSUB?
What?
A PLATFORM BASED ON OPEN STANDARDS THAT IS EASILY PLUGGABLE TO EXISTING INFRASTRUCTURES AND THAT EXPLOITS LEGACY INFORMATION, SOCIAL GRAPH AND INTEREST GRAPH TO PROVIDE A PERSONALIZED INFORMATION “DASHBOARD” IN NEAR REAL-TIME.
use cases
HOW? A look inside
Step 1: Exploit distributed (social) graphs
http://www.insidefacebook.com/wp-content/uploads/2013/06/shutterstock_107108318.jpg
Step 2: Exploit interest graphs
BENEFITS OF USING INTEREST GRAPHS:
1. FLEXIBLE SOURCE OF BACKGROUND KNOWLEDGE
2. ANY DATASET CAN BE “PLUGGED-IN” IF NEEDED
3. CROSS-DOMAIN RECOMMENDATIONS
4. VERY GOOD IN DISCOVERING INTERESTING RECOMMENDATIONS
OUR APPROACH: SPREADING ACTIVATION
Interest graphs
DERIMaciej
BlogPost2
Maurice
"Emerging Technology"
http://dbpedia.org/resource/Data_analytics
http://dbpedia.org/resource/Emerging_technologies
sioc:creator_of
sioc:topic
worksat
interestrecommended
interest
owl:sameAs
Expanded User Profile (EUP)Includes both original and recommended interests
Social Software Entities
Additional Profile Knowledge
External Background Knowledge
(DBPedia + domain datasets)
Our Approach
A PLATFORM FOR SOCIAL NETWORKS: § ENTERPRISE FOCUS: PEOPLE, COMMUNITIES, INFORMATION
§ EFFICIENCY USING XMPP PUBSUB AND SPARQL 1.1 UPDATE
§ EXPLOIT INTEREST GRAPH AND VARIOUS DATA SOURCES TO PROVIDE PERSONALIZATION THROUGH SOPHISTICATED NEAR REAL-TIME RECOMMENDATIONS
Demonstrator
EASY TO INTEGRATE WITH CISCO INFRASTRUCTURE
OPEN STANDARDS (XMPP, SPARQL 1.1 UPDATE)
SCALABLE RECOMMENDATIONS BASED ON SOCIAL GRAPH WITH OVER 10M ENTITIES AND 40M EDGES COMPUTED BELOW 1 SECOND (0.2S ON AVERAGE).
MORE DETAILS: HTTP://ADVANSSE.DERI.IE/
demonstrator
Prototype stats
SOCIAL NETWORK GRAPH: • 100S USERS • 100S POSTS • 500+ TAGS • 2000+ ENTITIES • 15000+ EDGES
Saffron.deri.ie
BACKGROUND KNOWLEDGE GRAPH: • 11M ENTITIES • 40M EDGES
CROSS-DOMAIN GRAPH: • 3956 RESEARCH ARTICLES • LANGUAGE CONFERENCES
Why? What? How?
technical considerations
lessons learned
Technical considerations
ALGORITHM: • SEMANTIC NETWORK • LARGE DATASET • ITERATIVE GRAPH ALGORITHM • STATEFUL NODES • EMBEDDING OF DOMAIN LOGIC
Technical considerations
NON-NATIVE IMPORT OF RDF STARTUP TIME WITH DBPEDIA
• 12 MIN ON 24 CORE, 96GB RAM TO LOAD
PARALLEL PROCESSING OF ACTIVATIONS • STATE FOR EACH USER AT EACH NODE
SCALABILITY ISSUES LACK OF GLOBAL ALGORITHM CONTROL IMMATURE CODE BASE, LACK OF DOCUMENTATION
Technical considerations
NATIVE SUPPORT FOR RDF DBPEDIA (5.46GB) COMPRESSED TO 436MB LOW MEMORY REQUIREMENTS LOW STARTUP TIME (90S) FAST QUERY ACCESS < 1ms
Server design
XMPP SPREADING ACTIVATION HDT
ADVANSSE connectedsocial platform
XMPP client:Ignite Smack
Web application:Tomcat + Servlet
RDF store:Jena Fuseki
ADVANSSEserver
Personalisationcomponent
Recommendationalgorithm
XMPP
R/W RDF store:Jena Fuseki
XMPP
Java API
XMPP server:Ignite OpenFire
XMPP client:Ignite Smack
Fast, R/O RDF store: HDT
SPARQL
SPARQL + Java API
Java API + SPARQL
Java API
SPARQL
Java API
File import
Link resolver RDF store: Jena Fuseki
configuration
• DISTANCE CONSTRAINT DISABLED • FANOUT CONSTRAINT ENABLED • 10 TARGET ACTIVATIONS • ACTIVATION THRESHOLD 0.5 • INITIAL ACTIVATION 4.0, • MAXIMUM OUT EDGES 500, • AND A MAXIMUM OF 10 WAVES AND 1 PHASE
stats
DATASET: • 371 USERS • 6 INTEREST ON AVERAGE • DEGREE 2-5, UP TO 51
200ms 85% AVERAGE EXECUTION COVERAGE
The value
SOCIAL CAPITAL IN ENTERPRISE SOCIAL NETWORKS IN NOT FULLY EXPLOITED. ENTERPRISE SOCIAL PLATFORMS ARE DISTRIBUTED AND INCLUDE VARIOUS SOURCES OF INFORMATION. VALUABLE INFORMATION IN AN ORGANIZATION IS NOT DISCOVERED BY THE RELEVANT EMPLOYEES.
DISCOVER AND CONNECT WITH RELEVANT PEOPLE IN THE ORGANIZATION. AGGREGATE INFORMATION FROM VARIOUS DISTRIBUTED SOCIAL PLATFORMS USING OPEN STANDARDS PROVIDE NEAR REAL-TIME PERSONALIZATION BASED ON LARGE, DYNAMIC GRAPH DATA.
Why? What? How?
technical considerations
lessons learned
Lessons learned
• GREATER RELEVANCE TO REAL PROBLEMS • CLEARER REQUIREMENTS (AND MORE) • ACCESS TO ACTUAL USAGE DATA (REAL USERS)
• PATENTS VS. PUBLISHING
• PROTOTYPE INTEGRATION CONSUMES RESOURCES • MORE FOCUS ON FEATURE DEVELOPMENT • LESS EXPLORATION AND HYPOTHESIS TESTING
major considerations
ACCESS TO INDUSTRY DATA
INTEGRATION WITH THE PRODUCT?
https://www.keytrac.net/assets/industry-social-networks.jpg http://www.autointhenews.com/wp-content/uploads/2010/05/volvo-s60-crash-video-image.jpg
Summary
PROBLEM § INFORMATION OVERLOAD AND INEFFICIENT INFORMATION
DISCOVERY IN DISTRIBUTED ENTERPRISE SOCIAL NETWORKS SOLUTION
§ RECOMMENDER SYSTEM THAT EXPLOITS SOCIAL GRAPH § UTILIZE INTEREST GRAPH AND LEGACY INFORMATION § NEAR-REAL TIME PERSONALIZATION
TECHNOLOGY § OPEN SOURCE COMPONENT FOR RDF DATA AGGREGATION
USING XMPP AND SPARQL 1.1 UPDATE § PERSONALIZATION COMPONENT BASED ON SPREADING
ACTIVATION APPLICABLE TO MULTI-SOURCE, CROSS DOMAIN DATA