HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R....
-
date post
22-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R....
HFWEB June 19, 2000
Quantitative Measures for Distinguishing Web Pages
Melody Y. Ivory Rashmi R. Sinha Marti A. Hearst
UC Berkeley
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Research Goals
Identify key Web page aspects that impact usability – Easily-quantified measures for information-centric
sites Examine their effect through user studies
– Establish concrete thresholds Incorporate findings into a simulation tool
(Web TANGO)– Mimic Web site usage, report quantitative results– Enable comparison of alternative designs
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Outline
Methodology Data Analysis Predicting Web Page Rating Wrap-up
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology
Collect quantitative measures from 2 groups– Ranked: Sites rated favorably via expert review or
user ratings – Unranked: Sites that have not been rated
favorably
Statistically compare the groups Predict group membership
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology: Quantitative Measures
Identified 42 aspects from the literature– Page Composition (e.g., words, links, images)– Page Formatting (e.g., fonts, lists, colors)– Overall Page Characteristics (e.g., information and
layout quality, download speed)
Page composition & formatting aspects are easier to measure
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology: Metrics Selected
– Word Count– Body Text Percentage– Emphasized Body Text
Percentage– Text Positioning Count– Text Cluster Count– Link Count– Page Size
– Graphic Percentage– Graphics Count– Color Count– Font Count– Reading Complexity
We measured 1/2 of the aspects
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology:Data Collection
Collected data for 2,015 English & non-English information-centric pages from 463 sites– Education, government, newspaper, etc.
Data constraints– At least 30 words– No e-commerce pages– Exhibit high self-containment (i.e., no style sheets,
scripts, applets, etc.) 1,054 pages fit constraints (52%)
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology:Data Collection
Ranked pages– Favorably assessed by expert review or user
rating on expert-chosen sites
– Sources:– Yahoo! 101 (ER)– Web 100 (UR)– PC Mag Top 100 (ER)– WiseCat’s Top 100 (ER)– Webby Awards (ER) & Peoples Voice (UR)
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology:Data Collection
Unranked– Not favorably assessed by expert review or user
rating on expert-chosen sites– Do not assume unranked = unfavorable– Sources:
– WebCriteria’s Industry Benchmark– Yahoo Business & Economy Category– Others
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Methodology:Analysis Data
428 pages– 214 ranked pages– 840 unranked pages
• 214 chosen randomly
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Data Analysis: Findings
Several features are significantly associated with ranked sites
Several pairs of features correlate strongly– Correlations mean different things in
ranked vs. unranked pages Significant features are partially
successful at predicting if site is ranked
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Data Analysis: Significant Differences
Metric Ranked Unranked Ranked Unranked Sig.Word Count 790.5 585.8 1604.5 1315.7 0.150Body Text % 73.7 73.2 22.4 24.5 0.824Emphasized Body Text % 26.1 25 27.2 25.7 0.672Text Positioning Count 4.4 5.4 4.8 11.2 0.244Text Cluster Count 17.9 10.8 22.1 17.4 0.000Link Count 58.8 39.2 56.6 44.2 0.000Page Size (Bytes) 57341.2 39614.9 72024.3 34312 0.001Graphic % 53.6 52.8 27.9 29.3 0.756Graphics Count 25.1 17.5 28.1 22.5 0.002Color Count 8.6 7.4 3.8 3.1 0.001Font Count 4.6 4.6 2.7 2.9 0.836Reading Complexity (GFI) 15.8 19.6 7.8 21.1 0.014
Mean Standard Deviation
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Data Analysis: Significant Differences
Ranked pages– More text clustering (facilitates scanning)– More links (facilitate info-seeking)– More bytes (more content facilitate info
seeking)– More images (clustering graphics facilitates
scanning)– More colors (facilitates scanning)– Lower reading complexity (close to best numbers
in Spool study facilitates scanning)
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Data Analysis: Metric Correlations
Emp. Body T. Cluster Link Color Emp. Body T. Cluster Link ColorMetric Text% Count Count Count Text% Count Count CountLink Count -0.008 0.516 - 0.201 -0.077 0.548 - 0.540Graphics Count -0.040 0.370 0.305 0.331 -0.102 0.445 0.525 0.344Color Count -0.200 0.447 0.201 - 0.013 0.610 0.540 -Font Count -0.083 0.315 0.091 0.642 0.043 0.321 0.366 0.551
Ranked Unranked
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Data Analysis: Metric Correlations
Hypotheses based on correlations:– Ranked Pages
• Colored display text• Link clustering Both patterns on all pages in random sample
– Unranked Pages• Display text coloring plus body text emphasis or
clustering• Link coloring or clustering• Image links, simulated image maps, bulleted links At least 2 patterns in 70% of random sample
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Data Analysis: Example Pages
Ranked Example Unranked Example
Metric Example Mean Std. Dev. Example Mean Std. Dev.Emphasized Body Text % 7.2 26.1 27.2 46.7 25 25.7Text Cluster Count 17 17.9 22.1 11 10.8 17.4Link Count 59 58.8 56.6 24 39.2 44.2Graphics Count 4 25.1 28.1 15 17.5 22.5Color Count 10 8.6 3.8 6 7.4 3.1Font Count 7 4.6 2.7 12 4.6 2.9
Ranked Unranked
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Predicting Web Page Ranking
Linear Regression– Explains 10% of difference between groups– 63% Accuracy (better at unranked prediction)
Employ machine learning techniques
Quantitative Measures for Distinguishing Web Pages
HFWEB June 19, 2000
Future Work
New metrics computation tool– More quantitative measures– Process style sheets– Functional categories for pages– UI
Repeat data collection and analysis– Larger sample of pages
Validation studies with users More info: www.cs.berkeley.edu
/~ivory/research/web/