Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
1
Transcript of Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer...
![Page 1: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/1.jpg)
Detecting and Representing Relevant Page-Level Web DeltasSanjay Kumar MadriaDepartment of Computer SciencePurdue UniversityWest Lafayette, IN [email protected]
![Page 2: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/2.jpg)
Current Situation of W3
The Web allows information to change at any time and in any way
Two forms of changes Existence Structure and content
modification Leaves no trace of the
previous document
Replaces its antecedents leaving no trace!!!!
![Page 3: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/3.jpg)
Problems of Change Management Problem:
Detecting, Representing and Querying these changes
The problem is challenging Typical database approaches to detect changes
based on triggering mechanisms are not usable Information sources typical do not keep track
of historical information to a format that is accessible to the outside user
![Page 4: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/4.jpg)
Motivating Example Assume that there is a web site at
www.panacea.gov Provides information related to drugs used for
various diseases
![Page 5: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/5.jpg)
Motivating Example
Suppose, on 15th January, a user wishes to find out periodically (every 30 days)
information related to side effects and uses of drugs used for various drugs and
changes to these information at the page-level compared to its previous version
![Page 6: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/6.jpg)
Structure of www.panacea.gov Web page at www.panacea.gov contains a list of
diseases Each link of a particular disease points to a web
page containing a list of drugs used for prevention and cure of the disease
Hyperlinks associated with each drug points to documents containing a list of various issues related to a particular drug (description, manufacturers, clinical pharmacology, uses, side-effects etc)
From the hyperlinks associated with each issue, one can retrieve details of these issues for a particular drug
![Page 7: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/7.jpg)
A Snapshot as on 15th Jan
AIDS
Cancer
Heart disease
Diabetes
Impotence
Alzheimer’sDisease
Indavir
Ritonavir
Niacin
Hirudin
Vasomax
Caverject
Side effects
Uses
Side effects
Uses
Side effects
Uses
Uses
Side effectsSide effects
Ibuprofen
![Page 8: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/8.jpg)
Some Changes 25th January
Links related to Diabetes are removed New link containing information related to
Parkinson’s Disease Information related to issues, side-effects and
uses of various drugs for Cancer are also modified
![Page 9: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/9.jpg)
A Partial Snapshot as on 25th Jan
Parkinson’sDisease
Cancer
Diabetes
TolcaponeSide effects
Uses
Side effects
www.panacea.gov
![Page 10: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/10.jpg)
Some Changes 30th January
Links related to Impotence is modified• Previously provided by www.pfizer.com• Now by www.panacea.gov
Inter-linked structure of the Web pages related to Caverject is also modified
Information about Viagra, a new drug for Impotence is added
![Page 11: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/11.jpg)
A Partial Snapshot as on 30th Jan
Impotence
Vasomax
Caverject
Side effects
Uses
Uses
Side effects
Viagra
www.panacea.gov
![Page 12: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/12.jpg)
Some Changes 8th February
Link structure of Heart Disease is modified• Label Heart Disease is modified to Heart
Disorder• Content of the pages dealing with side-
effects and uses of Hirudin are updated• Inter-linked document structure of Niacin is
modified Web pages related to the side effects and uses
of Ibuprofen (Alzheimer’s Disease) are removed
![Page 13: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/13.jpg)
On 8th February
Heart disorderAlzheimer’s
Disease
Niacin
HirudinSide effects
Uses
Side effects
www.panacea.gov
![Page 14: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/14.jpg)
A Snapshot as on 15th Feb
AIDS
Cancer
Heart disease
Impotence
Alzheimer’sDisease
Indavir
Ritonavir
Niacin
Hirudin
Vasomax
Caverject
Side effects
Uses
Viagra
Parkinson’sDisease
![Page 15: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/15.jpg)
Objectives Web deltas - Changes to web information Detecting and representing relevant page-level web
deltas changes that are relevant to user’s query, not any
arbitrary changes or web deltas Restricted to page level
Detect those documents which are added to the site deleted from the site those documents which has undergone content or
structural modification How these delta documents are related to one another
and with other documents relevant to the user’s query
![Page 16: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/16.jpg)
The WHOWEDA Project WHOWEDA: A WareHouse of WEb DAta To design and implement a web warehousing
system capable of effective extraction, management, and processing of information on the World Wide Web
Data model: WHOM (WareHouse Object Model)
![Page 17: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/17.jpg)
Overview of WHOM Our web warehouse can be conceived of as a
collection of web tables A set of web tuples and a set of web schemas
represents a web table A web tuple is a directed graph containing nodes and
links and satisfies a web schema Nodes and links contain content, metadata and
structural information associated with Web documents and hyperlinks
Tree representation Web algebra containing web operators to manipulate
web tables Global Coupling, Web Select, Web Join etc.
![Page 18: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/18.jpg)
Overview of our approach Step 1: Two snapshots of old and new relevant
data is coupled from the Web using global web coupling operation and materialized in two web tables.
Step 2: Web join, left outer join and right outer joined operations are performed on these two web tables
Result is joined, left and right outer joined web tables Step 3: Delta web tables containing different types
of web deltas are generated from these resultant web tables.
Elaborate on these steps……...
![Page 19: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/19.jpg)
Step 1: Retrieving snapshots of Web data using Global Web Coupling
![Page 20: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/20.jpg)
Web Query Specification Features:
Draw a web query as a directed connected acyclic graph (also called a coupling query)
Query can also be specified in text form Specify search conditions on the nodes and
edges of the graph Performed by the global web coupling
operator
![Page 21: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/21.jpg)
Coupling Query Set of node variables Xn
Each variable represents set of Web documents Set of link variables Xl
Each variable represent set of hyperlinks Set of connectivities C in DNF defined over node
and link variables To specify hyperlink structure of the documents
Set of predicates P defined over some of the node and link variables
Specify metadata, content or structural conditions Set of coupling query predicates Q
Conditions on execution of the query
![Page 22: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/22.jpg)
Example
Suppose, on 15th January, a user wishes to find out periodically (every 30 days) from the web site at www.panacea.gov
information related to side effects and uses of drugs used for various diseases
Result of the query is stored in the form of web table
![Page 23: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/23.jpg)
Coupling Query
Xn = {a, b, d, k} Xl = { - } P = {p1, p2, p3, p4}
p1(a) = METADATA:: a[url] EQUALS “www.panacea.gov”
p2(b) = CONTENT:: b[html.body.title] NON-ATTR-CONT “drug list”
p3(k) = CONTENT:: k[html.body.title] NON-ATTR-CONT “uses”
p4(d) = CONTENT:: d[html.body.title] NON-ATTR-CONT “side effects”
![Page 24: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/24.jpg)
Coupling Query
C = k1 AND k2 AND k3 k1 = a < - > b k2 = b < -{1, 6} > d k3 = b < -{1, 3} > k
Q = {q1} q1(b) = COUPLING_QUERY:: polling_frequency
EQUALS “30 days”
![Page 25: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/25.jpg)
Pictorial Representation
a b
k
d
www.panacea.gov
“drug list”
“side effects”
“uses”
{1, 3}
{1, 6}
![Page 26: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/26.jpg)
Web Table Drugs (15th Jan)b0a0 u0
k0
d0
AIDSIndavir
b0a0 u1
k1
d1
AIDSRitonavir
b1a0
k2
d2
Cancer
Beta Carotene
b5a0
k12
d12
Alzheimer’sDisease
Ibuprofen
![Page 27: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/27.jpg)
Web Table Drugs (15th Jan)b3a0 d4 k5
DiabetesAlbuterol
b4a0 u4
k6
d5
Impotence Vasomax
u6u5
b4a0 u7
k7
d6
ImpotenceCavarject
u8
b2a0 u2
k3
d3Heart
DiseaseHirudin
![Page 28: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/28.jpg)
Web Table New Drugs (15th Feb)
b0a0 u0
k0
d0
AIDSIndavir
b0a0 u1
k1
d1
AIDSRitonavir
b1a0
k2
d2
Cancer
Beta Carotene
b2a0 u2
k3
d3Heart
DisorderHirudin
![Page 29: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/29.jpg)
Web Table New Drugs (15th Feb)
b2a0 u3
k7
d7Heart
DisorderNiacin
b4a0 u7
k7
d6
ImpotenceCavarject
b4a0 u9
k8
d8
Impotence Vasomax
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
![Page 30: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/30.jpg)
Web Table New Drugs (15th Feb)
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
b4a0 u12
k9
d9
Impotence Viagra
![Page 31: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/31.jpg)
Step 2: Performing Web Join, Left and Right Outer Web Join
![Page 32: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/32.jpg)
Web Join Information composition operator Combines two web tables into a single web table
under certain conditions Combine two web tables by concatenating a web
tuple of one web table with a web tuple of other web table whenever there exist joinable nodes
Two nodes are joinable if they are identical Two nodes are identical if the URL and last
modification date of the nodes are same The joined web tuple is stored in a different web
table
![Page 33: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/33.jpg)
Web Join Join web tables Drugs and New Drugs Nodes which has not undergone any changes
are the joinable nodes in these two web tables.
Content modified nodes, new nodes and deleted nodes cannot be joinable nodes
![Page 34: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/34.jpg)
Joined web tableb0a0 u0
k0
d0AIDS Indavir
a0
AIDS
b0a0 u1
k1
d1
AIDSRitonavir
a0
AIDS
(1)
(2)
b0a0 u0
k0
d0
AIDSIndavir
a0 u1
k1
d1
AIDS
Ritonavir
(3)
![Page 35: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/35.jpg)
Joined Web Tableb2a0 u3
k4
d7Heart
DisorderNiacin
a0 u2
k3
d3Heart
DiseaseHirudin
(4)
b4a0 u7
ImpotenceCavarject
b4a0 u7
k7
d6
ImpotenceCavarject
u8
(5)
![Page 36: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/36.jpg)
Joined Table
b2a0 u2
k3
d3Heart
DiseaseHirudin
a0 u2
k3
d3Heart
Disorder
Hirudin
(6)
![Page 37: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/37.jpg)
Types of web tuples Web tuples in which all the nodes are joinable
Results of joining two versions of web tuples that has remained unchanged during the transition
Web tuples in which some of the nodes are joinable nodes remaining nodes are the result of insertion,
deletion or modification operations
b4a0 u7
ImpotenceCavarject
b4a0 u7
k7
d6
ImpotenceCavarject
u8
(5)
![Page 38: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/38.jpg)
Types of web tuples Tuples in which
Some of the nodes are joinable nodes Out of the remaining nodes some are result of
insertion, deletion or modification and The remaining ones remained unchanged
during the transition
b0a0 u0
k0
d0
AIDSIndavir
a0 u1
k1
d1
AIDS
Ritonavir
(3)
![Page 39: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/39.jpg)
Outer Web Join Web tuples that do not pariticipate in the web
join process (dangling web tuples) are absent from the joined web table
Outer web join enables us to identify them Left outer web join Right outer web join
![Page 40: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/40.jpg)
Web Table New Drugs (15th Feb)
b0a0 u0
k0
d0
AIDSIndavir
b0a0 u1
k1
d1
AIDSRitonavir
b1a0
k2
d2
Cancer
Beta Carotene
b2a0 u2
k3
d3Heart
DisorderHirudin
![Page 41: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/41.jpg)
Web Table New Drugs (15th Feb)
b2a0 u3
k7
d7Heart
DisorderNiacin
b4a0 u7
k7
d6
ImpotenceCavarject
b4a0 u9
k8
d8
Impotence Vasomax
![Page 42: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/42.jpg)
Web Table New Drugs (15th Feb)
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
b4a0 u12
k9
d9
Impotence Viagra
![Page 43: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/43.jpg)
Right Outer Web Join
b1a0
k2
d2
Cancer
Beta Carotene
b4a0 u9
k8
d8
Impotence Vasomax
b4a0 u12
k9
d9
Impotence Viagra
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
![Page 44: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/44.jpg)
Types of web tuples New web tuples which are added during the
transition These tuples contain some new nodes and
remaining ones content are changes Tuples in which all the nodes have undergone
content modification Tuples which existed before and in which
some of the nodes are new and remaining ones content have changed.
![Page 45: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/45.jpg)
Web Table Drugs (15th Jan)b0a0 u0
k0
d0
AIDSIndavir
b0a0 u1
k1
d1
AIDSRitonavir
b1a0
k2
d2
Cancer
Beta Carotene
b5a0
k12
d12
Alzheimer’sDisease
Ibuprofen
![Page 46: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/46.jpg)
Web Table Drugs (15th Jan)
b3a0 d4 k5
DiabetesAlbuterol
b4a0 u4
k6
d5
Impotence Vasomax
u6u5
b4a0 u7
k7
d6
ImpotenceCavarject
u8
b2a0 u2
k3
d3Heart
DiseaseHirudin
![Page 47: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/47.jpg)
Left Outer Web Join
b1a0
k2
d2
Cancer
Beta Carotene
b5a0
k12
d12
Alzheimer’sDisease
Ibuprofen
b3a0 d4 k5
DiabetesAlbuterol
b4a0 u4
k6
d5
Impotence Vasomax
u6u5
![Page 48: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/48.jpg)
Types of web tuples Web tuples which are deleted during the
transition These tuples do not occur in the new web table
Tuples in which all the nodes have undergone content modification
Tuples in which some of the nodes are deleted and remaining ones content have changed.
![Page 49: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/49.jpg)
Step 3: Generating Delta Web Tables
![Page 50: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/50.jpg)
Overview Input
Joined, left outer joined and right outer joined web tables
Output Set of delta web tables
![Page 51: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/51.jpg)
Delta Web Tables Delta web tables are used to represent web deltas Encapsulate the relevant changes that has occurred
in the Web with respect to a user’s query Three types
Delta+ web table • Contains a set of tuples containing new nodes
inserted during transition Delta- web table
• Set of web tuples containing nodes removed during the transition
Delta-M web table• Set of web tuples representing the previous and
current sets of modified nodes
![Page 52: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/52.jpg)
Steps for Generation Phase 1: Delta Nodes Identification Phase
Nodes which are added, deleted or modified during the transition are identified
Input: Old and new version of web tables and a set of joinable nodes from the joined web table
Output: Sets of nodes which are added, deleted or modified during the transition• Nodes which exists in new web table but not in old
web table are the new nodes• Nodes which exists in old web table but not in new
one are the deleted nodes• Nodes which exists in both the web tables but are not
joinable are the nodes which has undergone content modification
![Page 53: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/53.jpg)
Steps for Generation Phase 2: Delta Tuples Identification Phase
Determines how the delta nodes are related to one another and how they are associated with those nodes which have remained unchanged
We identify those tuples which contain nodes which are added, deleted or modified during the transition
Input: Joined, left outer joined and right outer joined web tables, sets of delta nodes
Output: Sets of web tuples represented by Delta+, Delta- and Delta-M web tables
![Page 54: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/54.jpg)
Phase 2 (Delta+ Web Table) Scan joined and right outer joined web tables to
identify web tuples containing nodes which are inserted during the transition
New nodes can occur in these tables only because
In the right outer joined table if the remaining nodes in the tuple containing the new nodes are modified (hence not joinable)
In the joined web table if some of the nodes in the tuple containing new nodes has remained unchanged and hence are joinable
These web tuples are stored in Delta+ Web Table
![Page 55: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/55.jpg)
Example (Right Outer Web Join)
b1a0
k2
d2
Cancer
Beta Carotene
b4a0 u9
k8
d8
Impotence Vasomax
b4a0 u12
k9
d9
Impotence Viagra
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
![Page 56: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/56.jpg)
Example (Joined Web Table)
b2a0 u3
k7
d7Heart
DisorderNiacin
a0 u2
k3
d3Heart
DiseaseHirudin
(4)
![Page 57: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/57.jpg)
Delta+ Web Table
b4a0 u9
k8
d8
Impotence Vasomax
b4a0 u12
k9
d9
Impotence Viagra
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
b2a0 u3
k7
d7Heart
DisorderNiacin
![Page 58: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/58.jpg)
Phase 2 (Delta- Web Table) Scan joined and left outer joined web tables to
identify web tuples containing nodes which are deleted during the transition
Deleted nodes can occur in these tables only because
In the left outer joined table if the remaining nodes in the tuple containing the deleted nodes are modified (hence not joinable)
In the joined web table if some of the nodes in the tuple containing deleted nodes has remained unchanged and hence are joinable
These web tuples are stored in Delta- Web Table
![Page 59: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/59.jpg)
Example (Left Outer Web Join)
b1a0
k2
d2
Cancer
Beta Carotene
b5a0
k12
d12
Alzheimer’sDisease
Ibuprofen
b3a0 d4 k5
DiabetesAlbuterol
b4a0 u4
k6
d5
Impotence Vasomax
u6u5
![Page 60: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/60.jpg)
Example (Joined Web Table)
b4a0 u7
ImpotenceCavarject
b4a0 u7
k7
d6
ImpotenceCavarject
u8(5)
![Page 61: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/61.jpg)
Delta- Web Table
b5a0
k12
d12
Alzheimer’sDisease
Ibuprofen
b3a0 d4 k5
DiabetesAlbuterol
b4a0 u4
k6
d5
Impotence Vasomax
u6u5
b4a0 u7
k7
d6
ImpotenceCavarject
u8
![Page 62: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/62.jpg)
Phase 2 (Delta-M Web Table) Finally, nodes which are modified during the
transition can be identified by inspecting all the three web tables
Tuples in the left and right outer joined tables which do not contain any new or deleted node represent the old and new version of these nodes respectively• These tuples do not occur in the joined web table as
all the nodes are modified Tuples in left and right outer joined tables that contain
modified nodes as well as inserted or deleted nodes• These modified nodes may not appear in the joined
web table if no other joinable web tuples contain these modified nodes
![Page 63: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/63.jpg)
Example (Right Outer Web Join)
b1a0
k2
d2
Cancer
Beta Carotene
b4a0 u9
k8
d8
Impotence Vasomax
b4a0 u12
k9
d9
Impotence Viagra
b6a0 u10
k10
d10
Parkinson’sDisease
Tolcaponeb6
![Page 64: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/64.jpg)
Example (Left Outer Web Join)
b1a0
k2
d2
Cancer
Beta Carotene
b5a0
k12
d12
Alzheimer’sDisease
Ibuprofen
b3a0 d4 k5
DiabetesAlbuterol
b4a0 u4
k6
d5
Impotence Vasomax
u6u5
![Page 65: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/65.jpg)
Phase 2 Tuples in the joined web tables where some of
the nodes represent the old and new version of these modified nodes
These web tuples are stored in Delta-M Web Table
![Page 66: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/66.jpg)
Example (Joined web table)
b0a0 u0
k0
d0AIDS Indavir
a0
AIDS
b0a0 u1
k1
d1
AIDSRitonavir
a0
AIDS
(1)
(2)
![Page 67: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/67.jpg)
Delta-M Web Tableb0a0 u0
k0
d0AIDS Indavir
a0
AIDS
b0a0 u1
k1
d1
AIDSRitonavir
a0
AIDS
(1)
(2)
b4a0 u7
ImpotenceCavarject
b4a0 u7
k7
d6
ImpotenceCavarject
u8
(3)
![Page 68: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/68.jpg)
Delta-M Web Tableb2a0 u2
k3
d3Heart
DiseaseHirudin
a0 u2
k3
d3Heart
Disorder
Hirudin
(4)
b1a0
k2
d2
Cancer
Beta Carotene
b1a0
k2
d2
Cancer
Beta Carotene
(5)
![Page 69: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/69.jpg)
Applications Provides the framework for
Trend analysis E-commerce
• Consumer behaviour• Product comparisons • Competitive Intelligence• Notification Services • Provide a useful database for buyer and
sellers agents
![Page 70: Detecting and Representing Relevant Page-Level Web Deltas Sanjay Kumar Madria Department of Computer Science Purdue University West Lafayette, IN 47907.](https://reader038.fdocuments.us/reader038/viewer/2022110322/56649d3e5503460f94a172da/html5/thumbnails/70.jpg)
Future Work Analytical and empirical studies of the
algorithms for generating delta web tables Mechanism to distinguish between the
modified, new or deleted nodes Annotation on delta nodes
Extend to sub-page level Query languages for querying the changes Change notification service