Information Management on the World-Wide Web
description
Transcript of Information Management on the World-Wide Web
![Page 1: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/1.jpg)
1
Information Management Information Management on the World-Wide Webon the World-Wide Web
Junghoo “John” ChoJunghoo “John” Cho
UCLA Computer ScienceUCLA Computer Science
![Page 2: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/2.jpg)
2
The Web and Information GaloreThe Web and Information Galore
![Page 3: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/3.jpg)
3
10 Years Ago10 Years Ago
Reading papers for Reading papers for researchresearch– Stacks of papersStacks of papers– Long waitLong wait
![Page 4: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/4.jpg)
4
With WebWith Web
![Page 5: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/5.jpg)
5
Challenges (1)Challenges (1)
Information overloadInformation overload– Too much information, too little timeToo much information, too little time
![Page 6: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/6.jpg)
6
Information OverloadInformation Overload
““XML” to GoogleXML” to Google– 14 Million14 Million matching documents! matching documents!
““XML” to AmazonXML” to Amazon– 464464 matching books! matching books!
Which one to read?Which one to read?
![Page 7: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/7.jpg)
7
Challenges (2)Challenges (2)
Hidden WebHidden Web
– Not indexed by Search EnginesNot indexed by Search Engines– ““Hidden” from an average userHidden” from an average user– Browse every site manually?Browse every site manually?
…
![Page 8: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/8.jpg)
8
Challenges (3)Challenges (3)
TransienceTransience
![Page 9: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/9.jpg)
9
Challenges (4)Challenges (4)
Scattered & unstructured dataScattered & unstructured data– All Computer Science faculty members and All Computer Science faculty members and
graduate students in the US?graduate students in the US?
![Page 10: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/10.jpg)
10
Projects In Our GroupProjects In Our Group
Web ArchiveWeb Archive Hidden Web IntegrationHidden Web Integration Page Ranking AlgorithmPage Ranking Algorithm User Recommendation SystemUser Recommendation System
![Page 11: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/11.jpg)
11
User Recommendation SystemUser Recommendation System
464 books on XML464 books on XML Which one to read?Which one to read?
– The one that my The one that my colleagues and friends colleagues and friends recommend?recommend?
![Page 12: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/12.jpg)
12
Amazon’s Recommendation SystemAmazon’s Recommendation System
1 – 5 star rating by individual users1 – 5 star rating by individual users Books can be sorted by “average user Books can be sorted by “average user
rating”rating”
![Page 13: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/13.jpg)
13
My Typical ScenarioMy Typical Scenario
Sort books by their average user ratingSort books by their average user rating Browse top 20 books to decide what to readBrowse top 20 books to decide what to read
![Page 14: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/14.jpg)
14
QuestionsQuestions
Is “5 star” by one user better than “4.9 star” Is “5 star” by one user better than “4.9 star” by 100 users?by 100 users?– Intuitively, I prefer 4.9 star by 100 usersIntuitively, I prefer 4.9 star by 100 users– More “reliable” ratingMore “reliable” rating
How much can I trust the rating of a How much can I trust the rating of a particular person?particular person?– How do I know that the person’s rating is How do I know that the person’s rating is
reliablereliable
![Page 15: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/15.jpg)
15
Our ApproachOur Approach
““Inherent quality” or “rating” of a bookInherent quality” or “rating” of a book– How many users recommend the book (i.e., How many users recommend the book (i.e.,
give high rating) if all users have read the give high rating) if all users have read the book?book?
More user rating More user rating More information on More information on the “quality” of the bookthe “quality” of the book– An average user is likely to give high rating for An average user is likely to give high rating for
a high-quality booka high-quality book
![Page 16: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/16.jpg)
16
Probabilistic Rating ModelProbabilistic Rating Model
How likely is the book of “4 star rating”?How likely is the book of “4 star rating”?– Rating probability distributionRating probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
![Page 17: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/17.jpg)
17
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
![Page 18: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/18.jpg)
18
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
After five-star ratingby a user
![Page 19: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/19.jpg)
19
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
After one-star ratingby a user
![Page 20: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/20.jpg)
20
Update of Rating ProbabilityUpdate of Rating Probability
As more users provide rating, we update As more users provide rating, we update our probability distributionour probability distribution
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5
Book rating/quality
Prob
abil
ity
dens
ity
After many ratings
![Page 21: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/21.jpg)
21
Bayesian Inference TheoryBayesian Inference Theory
Given a user rating UR, what is the inherent rating Given a user rating UR, what is the inherent rating IR?IR?
)(
)()|()|(
URP
IRPIRURPURIRP
Probability of book rating BEFORE user ratingProbability of book rating
AFTER user rating
![Page 22: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/22.jpg)
22
User ModelUser Model
The characteristics of a userThe characteristics of a user
Sensitivity: Slope of the curveSensitivity: Slope of the curve+1: good, –1 : bad, 0: not useful+1: good, –1 : bad, 0: not useful
1
2
3
4
5
1 2 3 4 5
1
2
3
4
5
1 2 3 4 5
Good Bad
Book quality
Use
r ra
ting
Book qualityU
ser
rati
ng
![Page 23: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/23.jpg)
23
User ModelUser Model
The characteristics of a userThe characteristics of a user
Bias: Average “height” of the curveBias: Average “height” of the curve
1
2
3
4
5
1 2 3 4 5
1
2
3
4
5
1 2 3 4 5
Positive bias Negative bias
Book quality
Use
r ra
ting
Book qualityU
ser
rati
ng
![Page 24: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/24.jpg)
24
Iterative Model RefinementIterative Model Refinement
As more users rate a book, we get better As more users rate a book, we get better estimates on book qualityestimates on book quality
As we estimate a book quality better, we get As we estimate a book quality better, we get better idea on a user’s sensitivity and biasbetter idea on a user’s sensitivity and bias
![Page 25: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/25.jpg)
25
Iterative Model RefinementIterative Model Refinement
User-providedRating
Book Rating Estimate
UserCharacteristics
![Page 26: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/26.jpg)
26
Final RecommendationFinal Recommendation
Recommend the book with the highest Recommend the book with the highest expected ratingexpected rating
![Page 27: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/27.jpg)
27
Initial ResultsInitial Results
Our system prefers a 4.9-star book by 100 Our system prefers a 4.9-star book by 100 people to a 5-star book by 1 userpeople to a 5-star book by 1 user
If a user gives random ratings, the system If a user gives random ratings, the system ignores the user’s ratingignores the user’s rating
More thorough evaluation on the wayMore thorough evaluation on the way
![Page 28: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/28.jpg)
28
Other ProjectsOther Projects
Web ArchiveWeb Archive Hidden Web IntegrationHidden Web Integration Page Ranking AlgorithmPage Ranking Algorithm
![Page 29: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/29.jpg)
29
Ph.D. Students on the ProjectsPh.D. Students on the Projects
Alex NtoulasAlex Ntoulas Rob AdamsRob Adams Victor LiuVictor Liu– In Dr Chu’s groupIn Dr Chu’s group
![Page 30: Information Management on the World-Wide Web](https://reader035.fdocuments.us/reader035/viewer/2022062806/56814f5a550346895dbd0b11/html5/thumbnails/30.jpg)
30
Thank YouThank You
Questions?Questions?