TEMPLATE DESIGN © 2008 Efficient Crawling of Complex Rich Internet Applications Ali Moosavi,...

www.PosterPresentations.com

Efficient Crawling of Complex Rich Internet Applications Ali Moosavi, Salman Hooshmand, Gregor v. Bochmann, Guy-Vincent Jourdan, Iosif Viorel Onut

School of Electrical Engineering and Computer Science - University of Ottawa

Introduction – RIAs and Crawling

Crawling is the task of going through all the contents of a web application automatically. Crawlers are used for content indexing, black-box security testing, automated maintenance, etc.

Rich Internet Applications (RIAs) are a new generation of web applications that make heavy use of client side code to present content, using technologies such as AJAX. RIAs modify the current page with no need to reload the complete page. This technique has numerous benefits such as better interactivity, reduced traffic, and better responsiveness.

Traditional web crawlers, however, are unable to explore RIAs. The problem of crawling RIAs has been a focus for researchers during recent years, and solutions have been proposed based on constructing a state machine in which:

State = DOM of the webpage Transition = JavaScript event execution

In order to ensure coverage, a crawler has to execute all events from all states.

Methodology – Component-based Crawling

Our method models the RIA in terms of components and their individual states, hence it is called component-based crawling. In component-based crawling:

Results (cont.)

Acknowledgments

This work is supported in part by IBM and the Natural Science and Engineering Research Council of Canada.

DISCLAIMERThe views expressed in this poster are the sole responsibility of the authors and do not necessarily reflect those of the Center for Advanced Studies of IBM.

Motivation and Aim

Conclusions

Component-based crawling provides superior efficiency compared to the current (DOM-based) methods. Exponential speed-up in crawling, hence the ability to crawl websites that

are not crawlable using DOM-based methods Reduced model size, leading to more efficient security testing

Results

Current DOM-based solutions are only effective on small-scale RIAs. When faced with real-life RIAs, they quickly lose scalability and face state space explosion.

The main reason for state space explosion is that different mixtures of the independent components of a RIA can produce a great variety of DOMs with no new data.

Scalability

0 5 10 15 20 25 30 35 40 450:00:00

0:28:48

0:57:36

1:26:24

1:55:12

2:24:00

2:52:48

ClipMarks

Component-Based Greedy

# of Items in the website

0 5 10 15 20 25 300:00:00

0:14:24

0:28:48

0:43:12

0:57:36

1:12:00

1:26:24

1:40:48

Component-Based DOM-Based

# of Items in the website

0 10 20 30 40 50 60 70 80 90 1000:00:00

0:28:48

0:57:36

1:26:24

1:55:12

2:24:00

2:52:48

3:21:36

ElFinder

Component-Based DOM-Based

# of files to show

TestRIA Altoro Mutual ClipMarks Periodic Table Elfinder Bebop0

Comparison of Crawling Efficiency on 6 Test Cases

DOM-Based Component-Based

State 1

State 2 State 3

State 5

State 4

State 2 State 3 State 1

State 1

+ State 2 + State 3

+ State 3

+State 4

In the example above, current methods explore the red widget once with the blue widget open (state 3) and once with the blue widget closed (state 5). These two are considered as separate states of the application, and no connection is found between them. This simple website takes hours to be crawled.

Real-world websites contain many independent components. The picture below shows an example from Facebook.com, with independent components marked with red borders. Current methods are unable to crawl such complex RIAs.

We partition each DOM to components (subtrees of the HTML tree) The current webpage has a state States belong to webpages

set of statesComponents

Using component-based crawling, the crawler now knows that the opened red widget is an already-visited state (state 3), and knows how it behaves. (e.g. the yellow transition). Therefore, there is no need to explore this branch further.

In component-based crawling, we model the website as a multi-state-machine.The multi-state-machine can be represented as a tuple where A is the set of states, is the set of initial states (those that are present in the DOM when the URL is loaded), is the set of events, and is a function that defines the set of valid transitions. Picture below illustrates modelling of an event execution in our method versus DOM-based methods.

These charts show the effect of increasing/decreasing the size of some websites on total crawling time. Where DOM-based crawling time grows exponentially, component-based crawling is able to scale nearly linearly.

Crawling Efficiency

TestRIA Altoro Mutual ClipMarks Periodic Table ElFinder Bebop

DOM-Based 1,003 2,576 12,398 31,814 30,833 72,290

Component-Based 142 308 443 3,856 2,733 293

This chart compares DOM-based and component-based methods on 6 websites, based on the exploration cost, which is:

Model Size

Component-based ModelDOM-based Model

TestRIA:

Altoro Mutual:

ClipMarks:

Component-based crawling yields smaller and simpler models from RIAs, which increases the efficiency of performing security tests using the models.

𝑛𝑒+𝑛𝑟×𝑤𝑟

where is the number of events executed, is the number of resets performed, and is the weight of a reset.

DOM-Based

TEMPLATE DESIGN © 2008 Efficient Crawling of Complex Rich Internet Applications Ali Moosavi,...

Documents

Transcript of TEMPLATE DESIGN © 2008 Efficient Crawling of Complex Rich Internet Applications Ali Moosavi,...

MMEW 2013 – Hibbing, Minnesota By Dean Moosavi smoosavi@charter.net

Tyn Myint-U Lokenath Debnath Linear Partial Differential ...sharif.ir/~moosavi/Myint-U_Debnath-Linear_Partial_Differential... · Tyn Myint-U Lokenath Debnath Linear Partial Differential

34 Sand Control Fundamentals & Design D. Koukhani / A. Hooshmand.

MOVE Congress 2017: Jonathan Hooshmand (University of Miami) Encouraging Safe and Active Transportation to School

A Survey of Techniques for Designing I/O-Efficient Algorithm S.Fahimeh Moosavi Fall 1389.

ShortPaper UrbanGIS2017 (1) - Sobhan Moosavi · (a)StrictAggregation (b)EasyAggregation Figure2: Di"erencebetweenstrict(a)andeasy(b)aggregation:Inthe!rstcase,weconsiderboth independentanddependentsegments,while,for

Gregor v. Bochmann School of Information Technology and Engineering (SITE) University of Ottawa

Trickle bed reactors for biomethanation Bochmann Trickle … · Trickle bed reactors for biomethanation Günther Bochmann & Lydia Rachbauer. Günther Bochmann –IFA-Tulln Power-to-Gas

Christopher Bochmann - A linguagem harmónica do tonalismo pdf

Synthesis of Communication Systems, August 2002 1 Gregor v. Bochmann, University of Ottawa Synthesis of communication systems Gregor v. Bochmann School.

Home | Journals | Instructions | Databases | Special Issues | … · 2014-12-08 · M.Y. Abdollahzadeh Jamalabadi, Payam Hooshmand and Behnam Broumand [ Abstract ] [ Full Text ] Strategic

SEG 2106 SOFTWARE CONSTRUCTION INSTRUCTOR: HUSSEIN AL OSMAN THE COURSE MATERIAL IS BASED ON THE COURSE CONSTRUCTED BY PROFS: GREGOR V. BOCHMANN ( HTTPS://BOCHMANN/)

NUOL Intranet Project. The Team Project group: Rafic Dalle – KTH, Sweden Sahar Moosavi – KTH, Sweden Khamphanh Sithavong – NUOL, Laos Manivone Dokdouangnary.

Universal Adversarial Perturbations - CVF Open …openaccess.thecvf.com/content_cvpr_2017/papers/Moosavi...Universal adversarial perturbations Seyed-Mohsen Moosavi-Dezfooli∗† seyed.moosavi@epfl.ch

HNUST, 2009 1 Gregor v. Bochmann, University of Ottawa Gregor v. Bochmann School of Information Technology and Engineering (SITE) University of Ottawa.

Various Complications of Complex Regional Pain Syndrome ...Various Complications of Complex Regional Pain Syndrome (CRPS) H. Hooshmand, M.D. and Eric M. Phillips Neurological Associates

SEG 2506 CONSTRUCTION DE LOGICIEL INSTRUCTEUR : HUSSEIN AL OSMAN LA MATIÈRE DE CE COURS EST BASÉE SUR CELLE DES PROFS: GREGOR V. BOCHMANN ( HTTPS://BOCHMANN/)

Modeling and Formal Specification of the Personal ...bochmann/Curriculum/Pub/1993... · Modeling and Formal Specification of the Personal Communication Service D. Desbiens, G.v. Bochmann,

AAPN, 2007 1 Gregor v. Bochmann, University of Ottawa Presentation at the APOC conference Wuhan, November 2007 Gregor v. Bochmann School of Information.

Doc5dentistry.umsha.ac.ir/uploads/Revisions.pdf · H7N1 , H9N ,H5N1 . Title: Microsoft Word - Doc5 Author: Hooshmand Created Date: 3/4/2013 12:30:56 PM