Characterizing Machine Agent Behavior through SPARQL Query Mining

23
Characterizing Machine Agent Behavior through SPARQL Query Mining Aravindan Raghuveer Yahoo! Inc, Bangalore. [email protected]

description

Mining SPARQL queries to understand the behavior of au-tomated programs (or machine agents) is an important stepin designing systems for the semantic web. We presenttechniques that differ from state-of-the-art SPARQL miningtechniques in two ways: 1. Move away from one SPARQLquery at a time view to SPARQL user session view 2. Lookat the results of SPARQL queries in addition to the queryitself. Due to these two approaches, we are able to find twonew patterns in SPARQL queries that help us reason betterabout the underlying program that generated the SPARQLqueries. Through a variety of experiments, we show thatthe patterns found have significant support in all the fourdatasets provided by the USEWOD committee.

Transcript of Characterizing Machine Agent Behavior through SPARQL Query Mining

Page 1: Characterizing Machine Agent Behavior through SPARQL Query Mining

Characterizing Machine Agent Behavior through SPARQL Query

MiningAravindan RaghuveerYahoo! Inc, Bangalore.

[email protected]

Page 2: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Introduction: LOD Users

The LOD cloud has two types of users- Humans (browsers). - Programs / machine agents.

2

Page 3: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Introduction: LOD Access Methods

3

The data on the LOD cloud can be accessed in multiple ways.

For this work, we categorize them into two buckets:- SPARQL : A powerful declarative graph query

language

- Non-SPARQL: Direct linked data requests.

Page 4: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Motivation: User Behavior Understanding

Deep Understanding of client behavior can help build “better” serving systems

Better:- Secure- Scalable- Available

Prior Work:- Moller et al , WebSci 2010- Picalausa et al. Swim 2011- Kirchberg et. al Usewod 2011- Mario et. Al, Usewod 2011 4

Page 5: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Summarizing. . .

5

Human Users Machine Agents

Non-SPARQL

SPARQL This paper’s focus

Page 6: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

What this paper is about?

Mining of the USEWOD query log dataset to identify:

- Two Trends in Machine Agent Querying

- Two Patterns in Machine Agent Querying

6

Page 7: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

The USEWOD dataset

Query logs of servers hosting a part of LOD cloud data.

7

Type # records(million)

% SPARQL

bio2rdf Life sciences ~ 0.2 100%

lgd Geo ~ 1.9 100%

SWDF Conference ~ 16.7 43.38%

dbpedia Structured wikipedia

~ 36.2 46.9%

Page 8: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Part-1: Two Trends in Machine Agent Querying

The Theme

“What are the overarching trends for SPARQL queries?”

8

Page 9: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Trend-1: SPARQL is here to stay!

9

SWDF Dbpedia

Take-away: SPARQL query volume is pretty significant

0.1 – 1million

Page 10: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Trend-2: SPARQL is heavily used by machine agents.

10

Took 17 million user agents from SPARQL queries from dbpediaand..

Page 11: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Part-2: Two Patterns in Machine Agent Querying

The Theme

“Looking at SPARQL query logs, can we reason about the program that generated the queries?”

11

Page 12: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Salient aspects of proposed Query Mining Techniques

Move from per query analysis to query session analysis

Move from query analysis to query result analysis

12

Page 13: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Pattern -1 : Loops in Programs

Take-away

• Through a per-user, temporal mining of logs, we discover patterns that are caused by loops in program.

• Significant support in all 4 datasets

13

Page 14: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Per-user Temporal mining

14User-1 User-2 User-3 User-4

TIME

Original Logs

User level Session Analysis

Loop

Page 15: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Intra Pattern Loop

successive queries from the same user, use the same “template”

Example: Two successive queries:

15

SELECT * WHERE {http://bio2rdf.org/dr:D00332http://bio2rdf.org/ns/bio2rdf#xRefhttp://bio2rdf.org/cas:54-47-7}

SELECT * WHERE{http://bio2rdf.org/dr:D00333http://bio2rdf.org/ns/bio2rdf#xRefhttp://bio2rdf.org/cas:54-47-7}

Only the subject (D00332,D00333) varies

Page 16: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Detecting Intra Pattern Loop

We convert a query to its canonical form by replacing variables, URI and literals by “keywords”.

16

SELECT * WHERE {http://bio2rdf.org/dr:D00332http://bio2rdf.org/ns/bio2rdf#xRefhttp://bio2rdf.org/cas:54-47-7}

Canonical Form of the previous queries: SELECT * WHERE { _URI_ _URI_ _URI_ }

Queries generated by the same template will have the same canonical form.

Page 17: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Salient Aspects of Intra Pattern loops

Iterate over a dictionary of values (categorical)

Iterate over a numerical range (example LIMIT, OFFSET parameters in SPARQL queries)

Multiple levels of nested loops with the same intra loop pattern.

4 Parameters to quantify above (in paper)17

Page 18: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Inter Pattern Loops

Found loops that iterate over a set of patterns

18

P1,P2,P3 ,P1,P2,P3,P1,P2,P3

Typically used when the output of the first query goes as a parameter to the second query.

(examples in paper)

Page 19: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Results

19

86% 32%

40% 16%

Take-away:Significant support

for loops!bio2rdf

lgd

swdf dbpedia

Page 20: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Pattern-2: Querying for dbpedia Linkage

Take-away:• By executing each query • analyze the results, we find that a portion of

queries “look” for dbpedia links• Results:- 20 months of SWDF queries had average of 8% look

for dbpedia urls- 2 days worth of lgd queries had 26.5% queries look

for dbpedia urls

20

Page 21: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

Summary & Conclusions

Proposed 2 new ways of SPARQL query mining:- Session view - Analyze results in addition to query

Showed that machine agents look for dbpedia using the owl:sameas annotation.

21

Influence on system design:- Can we pre-fetch elements in loop beforehand?- Priortitize dbpedia attributes for caching

Influence on log collection & analysis:- Stratified random sampling to remove effect of loops.

Page 22: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

22

For the great data !! For the great feedback & commentsFor listening!

Page 23: Characterizing Machine Agent Behavior through SPARQL Query Mining

Yahoo! Confidential

The famous LOD Cloud . . .

7 billion triples and counting!!23