Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,
-
Upload
juliahoxha -
Category
Technology
-
view
134 -
download
0
description
Transcript of Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD,
KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu
Enabling Semantic Analysis of User Browsing Patterns in the Web of Data
M.Sc. Julia Hoxha Institute of Applied Informatics and Formal Description Methods (AIFB) Karlsruhe Institute of Technology
USEWOD Workshop @WWW2012 Lyon, France
Paper
Hoxha, J., Junghans, M., and Agarwal, S. (2012). Enabling Semantic Analysis of User Browsing Patterns in the Web of Data. In 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD), 21st International World Wide Web Conference (WWW2012), Lyon, France, vol. CoRR, abs/1204.2713.
http://arxiv.org/abs/1204.2713
Outline Introduction
Framework for Behavior Analysis
Semantic Modeling of Cross-site Browsing Behavior
Web Browsing Activity Model (WAM)
Formalization Approach
Querying Behavioral Patterns
Evaluation
Conclusions
3 J. Hoxha – USEWOD Workshop, Lyon, 2012
Understanding user behavior in accessing Web resources helps site providers/domain experts: • Discover user preferences or detect bottlenecks
• Build adaptive Web sites
• Make appropriate recommendations to users, etc.
How to facilitate the analysis of usage patterns?
• Provide formal, semantic description of usage logs
• Offer techniques to expressively query patterns
Introduction
4 J. Hoxha – USEWOD Workshop, Lyon, 2012
HTTP Requests of Usage Logs InProceedi
ngs
swrc:Conference Event
swrc:Proceedings
foaf:Person
dc:creator
isA
ns2:relatedToEvent
swrc:Publication
ns1:name ns3:based_near
dbpedia: Populated
Place
ID Time User Action
1 [17:11:49:21 http://www.google.de/search?q=Lyon+www2012
1 [17:11:49:33] http://dbpedia.org/page/Lyon
1 [17:11:49:39] http://data.semanticweb.org/conference/ www/2011/demo/a-demo-search-engine-for-products
SWDF Domain Ontology
literal
Modeling and Analysis Framework
www
?
...
User 1 User n
?
www
Web Browsing Behavior Monitoring System
Semantic Activity Models
Domain Ontologies
Semantic Formalization
Selection
Target Data
--- --- --- --- --- --- --- ---
Preprocessed Data
Transformation
Preprocessing
Transformed Data
Event A Event B Event C
Event K Event N
Semantic Activity Model
Browsing
Activity
Formalization
Annotation with Domain Ontology
Cross-site Browsing Activities
Mo
nit
ori
ng
Form
aliz
atio
n
Pattern Mining
An
alys
is Querying Capabilities
Event e1 = (A1, I1, t1)
Type Ai ={content, function}
Input I1 = {i1,...,ik}
URL l1, Time t1
Event en = (An, In, tn)
Type An
Input In = {i1,...,ik}
URL ln, Time tn
User Session of browsing Events
Repository
s: <l1, l2, l3, ..., ln>
Semantic Formalization
5 J. Hoxha – USEWOD Workshop, Lyon, 2012
Definitions
Event • l full URL invoked, T types, P parameter, t timestamp
Event types
• Tc content type of an event
• Tf function type of an event
Session
• s is ordered sequence of events
• , s.t. i is the event order in s
• Ts start time and Te end time, s.t. 6 J. Hoxha – USEWOD Workshop, Lyon, 2012
7
Web browsing Activity Model (WAM)
wam:StartEvent
event:Event
rdfs:subClassOf
wam:Session
wam:EndEvent
wam:Parameter
wam:Input Variable
wam:OutputVariable
time:Temporal Entity
wam:User
wam:hasEvent wam:hasStartEvent
wam:hasEndEvent
wam:hasUser
Literal
wam:userID
Literal
wam:userIP
wam:hasTime
time:Interval
rdfs:subClassOf
wam:inInterval
Literal
wam:eventURL
wam:EventURL
wam:fullURL wam:baseURL
wam:hasInput
wam:hasParameter
Literal
wam:hasName wam:hasValue
wam:FunctionType
wam:Content Type
wam:functionType
wam:Event
time:Instant
wam:EventType
wam:contentType
rdfs:subClassOf
wam:<http://greenlinkeddata.org/wam.owl#> time:<http://www.w3.org/2006/time#> event: <http://purl.org/NET/c4dm/event.owl#> rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> rdfs:<http://www.w3.org/2000/01/rdf-schema#>
Domain Ontology used for semantic enrichment Based on function and content
?
http://www.avis.com/car-rental/reservation/ start-reservation.ac?resForm.pickUpLocation=Lyon
owa:Parameter Name
http://data.semanticweb.org/person/julia-hoxha
wam:BaseURL
wam:order
Literal
Formalization Approach Formalization based on
WAM ontology • Step 1. Semantic Enrichment
• Step 2. Extend Knowledge Base (ABox assertions for events & domain ontology)
• Step 3. RDF Serialization
Selection
Target Data
--- --- --- --- --- --- --- ---
Preprocessed Data
Transformation
Preprocessing
Transformed Data
Event A Event B Event C
Event K Event N
Semantic Activity Models
Semantic
Formalization
Annotation with Domain Ontology
Semantic Enrichment • For each link in logs, find URI of Web resource
• Find RDF representation of the resource (via a Mapping Template)
e.g. SWDF: http://data.semanticweb.org/person/julia-hoxha/html - HTML
http://data.semanticweb.org/person/julia-hoxha - URI
http://data.semanticweb.org/person/julia-hoxha/rdf - RDF/XML
• Extract ontology classes to which it belongs – used as ContentType of event (Person, ResearchGroup, Publication, MusicGroup, etc.)
8 J. Hoxha – USEWOD Workshop, Lyon, 2012
Semantic Analysis
Querying with semantic constraints
Address also temporal constraints
regarding the dynamics of user browsing behavior
Example: - In how many sessions within Mar-Apr 2011 users searched in Google, afterwards visited a page in SWDF?
Various levels of abstraction: e.g. instead of google -> any search engine or instead of any page -> WWW2011 page or even higher abstraction -> Conference page
9
s: <e1, ..., e2, ef >
e1.time e1.urlBase e1.type
„Conference“
„WWW2011“
isA
isA
J. Hoxha – USEWOD Workshop, Lyon, 2012
Consider real time (timestamps) and abstract time (order of events) to query usage patterns
Q: find sessions with start time Ts and end time Te containing an event e1 with URL
www.ex1.org, eventually succeeded by another e2 in the session with URL www.ex2.org
We address temporal logics capable of ontological reasoning • apply temporal operators e.g. next, eventually, always
(based on Lineal Temporal Logic - LTL)
• query formulated as LTL formula extended with DL axioms
Temporal Constraints
X LTL Formula in a State Transition System
LTL + DL - Proposition A as a set of Abox assertions e.g.
10 J. Hoxha – USEWOD Workshop, Lyon, 2012
A is true at the next state after the initial state s1
A is true at some state on the path
A is true at all states along the path
DL-LTL Query Formulation
Queries formulate
• 1) certain conditions on the session itself
• 2) temporal patterns in the events within the session
Query: Q (s): find sessions with start time Ts and end time Te
1) Conditions on the session itself 2) Temporal patterns within a session expressed as a DL-LTL formula, e.g.
containing an event e1 with content type “publication”, eventually succeeded by another e2 with function type “search engine”
11 J. Hoxha – USEWOD Workshop, Lyon, 2012
Query Answering Approach
Step 1. Check constraints on the session itself
Step 2. Verify temporal constraints applying model checking technique
Iterate over sessions S={S1, S2,…,Sn}
(a) build a finite state automaton (FSA) for each Si
(b) verification of DL-LTL formula
iterate over the states of FSA to determine whether a condition holds in the respective state
12 J. Hoxha – USEWOD Workshop, Lyon, 2012
SDWF 2009: % of sessions initiated in the domain
Evaluation Validate feasibility of the
formalization approach
Show feasibility of the query answering approach • Query sessions with
different patterns
• Measure performance
13
SWDF 2009
DBPedia 3-3
Monitoring Period
01.Jul.09- 12.Jul.09
01.Jul.09- 12.Jul.09
avg.#sessions/day
235.9 2899
#sessions
2831 31893
Formalization Bing 2.7%
Google 97%
Dbpedia 2009 DBPedia 2009: % of sessions initiated in the domain
• Only 1.46% of daily sessions containing SPARQL queries
Evaluation (II) Querying
• answering time varies slightly for the queries (~0.15 seconds)
• For up to 1000 sessions below 1.4 seconds
• model checking time is small
• OWL reasoning takes
~ 94% of the overall answering time
tim
e (s
ec)
nr. sessions
Q1
14 J. Hoxha – USEWOD Workshop, Lyon, 2012
Conclusions
Propose a framework for behavior modeling and analysis:
• Approach for semantic formalization of logs
• Techniques of querying patterns with temporal and semantic constraints
Challenges and Future Work • Find datasets of client-side navigation logs at multiple sites
• Domain Ontology acquisition
• Classification Techniques to find FunctionType
• Optimization of Query Answering
15 J. Hoxha – USEWOD Workshop, Lyon, 2012