Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V....

46
Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Jalal Mahmud Yevgen Borodin Yevgen Borodin I.V. Ramakrishnan I.V. Ramakrishnan Department of Computer Science State University of New York at Stony Brook Stony Brook, NY 11794

Transcript of Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V....

Page 1: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Content Analysis Techniques to Ease Browsing with Handhelds

Jalal MahmudJalal Mahmud

Yevgen Borodin Yevgen Borodin

I.V. RamakrishnanI.V. Ramakrishnan

Department of Computer ScienceState University of New York at Stony Brook

Stony Brook, NY 11794

Page 2: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Outline

Browsing with Handhelds:

Content Analysis Techniques: - Model-directed Web Transaction - Merchant-Side Web Transaction

- Context Browsing with Mobile - Context-directed Web Transaction

Evaluation:

Future Work:

Page 3: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Browsing with Handheld

User needs to do a lot of scrolling to get to the relevant content

Using PDA

Relevant Content

Page 4: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Problems

Small Screens Offer Narrow Interaction Bandwidth.

Unable to convey the Richness of the Web content.

Involves a Lot of Horizontal and Vertical Scrolling.

Tedious to Get to the Pertinent Content in a Page.

This is worse when one is interested in Web transactions (e.g. buying books, paying utility bills).

Page 5: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Our Approach

Relevant content

Irrelevant content

Filter Away Irrelevant Content and Only Present Relevant Content

First Present the Relevant Content.

Page 6: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Model-directed Web Transaction Web Transaction Examples:

- Buying a CD Player from Bestbuy

- Paying Utility Bills Online

Web Transaction Characteristics:

- A Sequence of Steps

- Each Step is Based on User-Selected Operation

Two aspects of a Web transaction:

- Semantic Concept

- Process Model

Page 7: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Semantic Concepts

Search ResultsTaxonomy Add to Cart Product Details

Page 8: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

item_select

submit_searchform

Process Model

TAXONOMY CONCEPT

SEARCH FORM CONCEPT

1

Page 9: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

select_item_category

item_select

submit_searchform

Process Model

1

Page 10: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

2

subm

it_se

arch

form

item_select

Process Model

SEARCH FORM CONCEPT

SEARCH RESULT CONCEPT

Page 11: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

ite

m_

se

lec

t

select_item_category

item_select

submit_searchform

2

add_to_cart

submit_searchform

Process Model

1

Page 12: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Process Model

3

1

2

4

5

6

show_item_detailadd_to_cart

add_to_cart

add_to_cartcheck_out

check_out

check_out

continue_shopping

item_select

select_item_category

select_item_category

submit_searchform

item_select

view_shoppingcart

view_shoppingcart,update_shoppingcart

submit_searchform

submit_searchform

1 - START STATE6 - FINAL STATE

Model-driven transaction

ite

m_

se

lec

t

Su

bm

it_s

ea

rch

form

Page 13: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Process Model

3

1

2

4

5

6

show_item_detailadd_to_cart

add_to_cart

add_to_cartcheck_out

check_out

check_out

continue_shopping

item_select

select_item_category

select_item_category

submit_searchform

item_select

view_shoppingcart

view_shoppingcart,update_shoppingcart

submit_searchform

submit_searchform

1 - START STATE6 - FINAL STATE

Model-driven transaction

ite

m_

se

lec

t

Su

bm

it_s

ea

rch

form

Page 14: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation Results

Built using Automata Learning Techniques

Training Data

Over 200 Transaction Sequences Collected from over 30 Sites

Recall / Precision

90% / 96% for Books domain

86% / 88% for Consumer Electronics domain

84% / 92% for Office Supplies domain

Process Model

Page 15: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Concept Extraction

LOGICAL TREE

Sort Results By

Select Box

Image

Insignia

Image

Browse

Image

Case Logic

Best Matches

Brand

Sony

Browse

Browse

Camera

Software

Electronics

Case Logic

Taxonomy

Camera

Software

Electronics

Image

Insignia

Image

Browse

Image

Sony

Browse

Browse

Search Result

Electronics

Search Phrase

Search Form

Select Box

Go Button

Entire Site

CONCEPT TREE

Page 16: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Developed a Statistical Model for Each Concept using Machine Learning Techniques

Training DataUsed Labeled Concepts from Over 100 Pages Collected from Two Dozen Sites

Evaluation ResultsConcept Extraction

Page 17: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation Results

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%S

ea

rch

Fo

rm

Se

arc

hR

esu

lt

Item

Ta

xon

om

y

Item

Lis

t

Item

De

tail

Sh

op

pin

gC

art

Ad

d to

Ca

rt

Ed

it C

art

Co

ntin

ue

Sh

op

pin

g

Ch

eck

ou

t

Books

Electronics

Office Supplies

Recall for Concept Extraction

Page 18: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Model-directed Web Transaction on Handheld: Guide-O-Mobile

Guide-O Mobile

Guide-O-Mobile

Page 19: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Outline

Browsing with Handhelds: Content Analysis Techniques:

- Model-directed Web transaction - Merchant-Side Process Modeling

- Context-Browsing with Mobile - Context-Directed Web Transaction

Evaluation:

Future Work:

Page 20: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Client-Side Process Modeling: Problems

Client-Side Process Modeling in Guide-O-Mobile.

Process Model is Stored in Client Side.

Separate Process Model Needed for Each Domain.

Performance Largely Depends on Concept Extraction.

Page 21: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Merchant-Side Process Modeling Labeled Web Content with Semantic Annotations.

Content Providers will Label their Web Content.

XHTML will be Used to Label Relevant Content in the Web Sites Describe Process Models Specific to the Sites.

Mobile Users will Use the System to Easily Identify Relevant Information. Perform On-Line Transactions.

Page 22: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Prototype ImplementationXHTML tags:

<log in>, <continue shopping>, <add to cart>, <edit cart>, <search form>, <search result>, <item>, <item taxonomy>, <item list>, <item detail>, <item description>, and <checkout>.

Page 23: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Outline

Browsing with Handhelds: Content Analysis Techniques:

- Model-directed Web Transaction - Merchant-side Web Transaction

- Context-Browsing with Mobile - Context-Directed Web Transaction

Evaluation:

Future Work:

Page 24: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Context Browsing with Mobile

On Following a Link Collect Context of the Link Identify the Relevant Section on the Next Page

Using the Context Present the Relevant Section.

Context Browsing Reduces Information Overload Makes Mobile Browsing Faster.

Page 25: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Context-directed Browsing

Page 26: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Context-directed Browsing

Page 27: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

How Do We Find Relevant Content?

Finding What is Important on a Web Page: Is Subjective on Any Distinct Page Can be Inferred in a Sequence of Pages

Page 28: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Click on the “MP3 Players" LinkClick on the “MP3 Players" LinkClick on the “MP3 Players" LinkClick on the “MP3 Players" LinkCollect Context of the LinkCollect Context of the LinkCollect Context of the LinkCollect Context of the Link

Page 29: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Find Relevant Section Using Find Relevant Section Using ContextContext

Find Relevant Section Using Find Relevant Section Using ContextContext

Collect Context of the LinkCollect Context of the LinkCollect Context of the LinkCollect Context of the LinkClick the Link – Collect ContextClick the Link – Collect ContextClick the Link – Collect ContextClick the Link – Collect Context

Page 30: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Find Relevant Section Using Find Relevant Section Using ContextContext

Find Relevant Section Using Find Relevant Section Using ContextContext

Click the Link – Collect ContextClick the Link – Collect ContextClick the Link – Collect ContextClick the Link – Collect Context

Page 31: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Context Browsing with Mobile: CMo Prototype

Page 32: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Product Search Using CMo

Page 33: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Outline

Browsing with Handhelds: Content Analysis Techniques:

- Model-directed Web transaction - Merchant-side Web transaction

- Context-Browsing with Mobile - Context-directed Web Transaction

Evaluation:

Future Work:

Page 34: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

No Process Model

Contextual Browsing with a Domain-Dependent Knowledge-Base

Relevant Segment Identification Using Contextual

Browsing

Concept Segment Identification Using Knowledge-Base and Heuristics Algorithms

Context-directed Web Transaction

Page 35: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Context-directed Web Transaction: Prototype System

The Online Shopping Knowledge-Base Consists of the Following Few Concepts:

SearchForm, AddToCart, Taxonomy, ShoppingCart, Checkout, etc.

Implementing the Prototype is a Work in Progress.

Page 36: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation: Guide-O-Mobile Experimental Set-Up

Guide-O-Mobile1.2 GHz desktop with 256 MB RAM

Client-Server Model

Client: 400 MHz iPaq with 64 MB RAM

Server: Core Guide-O System

Evaluation Over two dozen CS graduate students

Over 30 web sites spanning Books, Consumer Electronics and Office Supplies domains

Page 37: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation: Guide-O MobileGuide-O-Mobile: Overall Time Performance

0

100

200

300

400

500

600

Tim

e(se

c)

Books Electronics OfficeSupplies

Overall Time

Original Page inHandheld

Guide - O - Mobile

Page 38: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation: Guide-O Mobile

Guide-O-Mobile Overall Time Performance– with standard deviation

Overall Time

0

100

200

300

400

500

600

Books Electronics Office Supplies

Tim

e(s

ec)

Original Page inHandheld

Guide - O - MobileStandard Deviation

Page 39: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation: Guide-O MobileGuide-O-Mobile: Interaction Time

0

50

100

150

200

250

Tim

e(se

c)

Books Electronics OfficeSupplies

Interaction Time

Original Page inHandheld

Guide - O - Mobile

Page 40: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation: Guide-O MobileGuide-O-Mobile Interaction Time Performance– with standard deviation

Interaction Time

0

50

100

150

200

250

300

Books Electronics OfficeSupplies

Tim

e(s

ec

)

Original Page inHandheld

Guide - O - Mobile

Standard Deviation

Page 41: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation:CMo Experimental Set-Up

Client-Server Model Client: IPAQ Pocket PC equipped with Microsoft Pocket PC operating system with wireless Internet connectivity.

Server: Core CMo System

Evaluation 8 CS graduate students completing 8 tasks (8 times each) on 8 Web sites from News and Shopping Domain.

Page 42: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation:CMoPerformance of Context Identification

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

News

Books

Elect

ronic

s

Offi

ce

Info

rmat

ional

Domains

Ac

cu

rac

y Recall

Precision

F-measure

Page 43: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Evaluation: CMoRelevant Information Identification

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Domains

Acc

urac

y Recall

Precision

F-measure

Page 44: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Browsing Efficiency with CMo

Page 45: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Conclusion and Future Work

Port all the Server Steps to the Handheld.

Extend the Mozilla's Minimo Mobile Browser with CMo Functionalities.

Mining Transactional Models from Contextual Information.

Page 46: Content Analysis Techniques to Ease Browsing with Handhelds Jalal Mahmud Yevgen Borodin I.V. Ramakrishnan Department of Computer Science State University.

Questions?