BEAR: Mining Behaviour Models from User-Intensive Web Applications

23
Mining Behaviour Models from User-Intensive Web Applications Carlo Ghezzi [email protected] Politecnico di Milano, Italy (IT) Mauro Pezzè [email protected] Università della Svizzera Italiana, Lugano (CH) Michele Sama [email protected] Touchtype Ltd, UK Giordano Tamburrelli [email protected] Università della Svizzera Italiana, Lugano (CH)

Transcript of BEAR: Mining Behaviour Models from User-Intensive Web Applications

Mining Behaviour Models

from User-Intensive Web

ApplicationsCarlo Ghezzi

[email protected]

Politecnico di Milano, Italy (IT)

Mauro Pezzè[email protected]

Università della Svizzera Italiana, Lugano (CH)

Michele [email protected]

Touchtype Ltd, UK

Giordano [email protected]

Università della Svizzera Italiana, Lugano (CH)

Scalability Privacy

Security Users

Modern Web Applications

• Millions of interactions per day

• Manage sensible data

• Secure economic transactions

• Capture/measure user behaviours

• User’s behaviours cannot be

predicted at design time.

• Only released applications allow

us to collect statistics

• Multiple and heterogeneous

navigational behaviours that

depend on several factors

• Behaviours may unpredictably

change over time

User behaviours

• Monitoring+analysis/mining

• Little support from a general software engineering perspective

Related work

Google AnalyticsLink PredictionWeb Caching

• General abstraction to support software engineers

• Automated and non-ambigous analysis tool

• Support for different user classes

• Other key features:

• extensibility (domain specific analysis)

• incrementality

• applicable to legacy systems

What is missing

• Exploit formal models to capture and quantitatively analyse

user behaviors

• Focus on RESTful architectures

• Based on log file mining applicable to legacy systems

Formal

MethodsWeb

Development

+

Our Idea

• User classes

• Give semantics to events

in the log file

• Infer user-behaviour

models (DTMC)

• Queries the models

Ingredients

BEAR

A real-world case study

• Small example, but general enough:

• URL with parameters

• URL with parametric structure

URL Description

/home/ Homepage of findyourhouse.com

/anncs/sales/ The first page that shows the sales announcements.

/anncs/sales/?page=< n>Nth page of sales announcements

/anncs/sales/< id> / Detailed view of the sales announcement

/anncs/renting/The first page that shows the renting announcements.

/anncs/renting/?page=< n> Nth page of renting announcements

/anncs/renting/< id> / Detailed view of the renting announcement

/search/ Page containing the results of a search

/admin/.../Website’s control panel

/admin/login/ Login page that allows to access the control panel.

/contacts/ URL with the form to contact a sales agent.

/contacts/submit/Contact form submitted

has been submitted.

/contacts/tou/Page that describes the website terms of use.

• A set of atomic propositions (AP) give semantics to the

entries in the log

• Declarative approach: @BearFilter

URLs ➔ Atomic Propositions

@BearFilter(regex="^/anncs/sales/(\w+)/$")

public static Proposition void filterSales(LogLine line){

return new Proposition("sales_anncs");

}

@BearFilter(regex="^/admin/login/$")

public static Proposition void filterLogin(LogLine line){

if(logLine.getHTTPStatusCode == "302")

return new Proposition("login_success");

else

return new Proposition("login_fail");

}

URLs ➔ Atomic Propositions

URL Atomic Propositions

/home/homepage

/anncs/sales/sales_page, page_1

/anncs/sales/?page=< n>sales_page, page_n

/anncs/sales/< id> /sales_anncs

/anncs/renting/renting_page, page_1

/anncs/renting/?page=< n>

renting_page, page_n

/anncs/renting/< id> /

renting_anncs

• Code fragments called classifiers to specify user classes

• Declarative approach: @BearClassifier

Identify User Classes

@BearClassifier(name="userAgent")

public static String UserAgentClassifier(LogLine logline) {

return logline.getAgent();

}

{(userAgent = “Mozilla/5.0...”), (location = “Boston”)}

• BEAR infers a set of DTMCs

• Sequential and incremental

process

• An independent DTMC for

each user class

Infer the models

IP TIMESTAMP URL

1.1.1.1 - [20/Dec/2013:15:35:02] - /home/

2.2.2.2 - [20/Dec/2013:15:35:07] - /admin/login/

1.1.1.1 - [20/Dec/2013:15:35:12] - /anncs/sales/1756/

2.2.2.2 - [20/Dec/2013:15:35:19] - /admin/edit/

Infer the models

Infer the models

incrementality

• Rewards: domain specific

metrics of interests

• Number of announcements

displayed

• DB Queries

Annotating the models

extensibility

• Probabilistic Computation Tree Logic (PCTL)

augmented with rewards

• BEAR Properties = scope + PCTL formula

Specifying the properties

{userAgent = “(.∗)Mozilla(.∗)”}P=?[F contact_requested]

{userAgent = “(.∗)(Android|iOS)(.∗)”}R=?[F end]

generality

Querying the models

automation

• Scope identifies the set of

relevant DTMCs among the

inferred models

• BEAR analysis engine

compose selected DTMCs into

single one

• PCTL verification performed

with PRISM on the composed

model

Model Composition• Union of the sets of states of the input DTMCs

• Law of total probability to compute transitions

• Detecting navigational anomalies:

• A difference between the actual and the expected user navigation

actions.

• Comparing the BEAR models with the site map:

{}P =?[(X si)]{sj}

• Measuring behaviours and attitudes

• {}P =?[(F sales_anncs) & (!(F renting_anncs))]

• {(?!(.∗)(Android | iOS))(.∗)}R=?[F end {sales_anncs}]

BEAR at work

BEAR: performance

• Variable number of states

• Variable length of log file

BEAR: performance

• Variable number of DTMCs

• Variable number of states

• More expressive formalisms

• Self-adaptive applications

Summary

• Formal analysis of user

behaviours in web apps

• Validation on a real case study

• On-going validation on a

mobile app