HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R....

19
HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R. Sinha Marti A. Hearst UC Berkeley
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of HFWEB June 19, 2000 Quantitative Measures for Distinguishing Web Pages Melody Y. Ivory Rashmi R....

HFWEB June 19, 2000

Quantitative Measures for Distinguishing Web Pages

Melody Y. Ivory Rashmi R. Sinha Marti A. Hearst

UC Berkeley

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Research Goals

Identify key Web page aspects that impact usability – Easily-quantified measures for information-centric

sites Examine their effect through user studies

– Establish concrete thresholds Incorporate findings into a simulation tool

(Web TANGO)– Mimic Web site usage, report quantitative results– Enable comparison of alternative designs

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Outline

Methodology Data Analysis Predicting Web Page Rating Wrap-up

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology

Collect quantitative measures from 2 groups– Ranked: Sites rated favorably via expert review or

user ratings – Unranked: Sites that have not been rated

favorably

Statistically compare the groups Predict group membership

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology: Quantitative Measures

Identified 42 aspects from the literature– Page Composition (e.g., words, links, images)– Page Formatting (e.g., fonts, lists, colors)– Overall Page Characteristics (e.g., information and

layout quality, download speed)

Page composition & formatting aspects are easier to measure

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology: Metrics Selected

– Word Count– Body Text Percentage– Emphasized Body Text

Percentage– Text Positioning Count– Text Cluster Count– Link Count– Page Size

– Graphic Percentage– Graphics Count– Color Count– Font Count– Reading Complexity

We measured 1/2 of the aspects

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology:Data Collection

Collected data for 2,015 English & non-English information-centric pages from 463 sites– Education, government, newspaper, etc.

Data constraints– At least 30 words– No e-commerce pages– Exhibit high self-containment (i.e., no style sheets,

scripts, applets, etc.) 1,054 pages fit constraints (52%)

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology:Data Collection

Ranked pages– Favorably assessed by expert review or user

rating on expert-chosen sites

– Sources:– Yahoo! 101 (ER)– Web 100 (UR)– PC Mag Top 100 (ER)– WiseCat’s Top 100 (ER)– Webby Awards (ER) & Peoples Voice (UR)

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology:Data Collection

Unranked– Not favorably assessed by expert review or user

rating on expert-chosen sites– Do not assume unranked = unfavorable– Sources:

– WebCriteria’s Industry Benchmark– Yahoo Business & Economy Category– Others

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Methodology:Analysis Data

428 pages– 214 ranked pages– 840 unranked pages

• 214 chosen randomly

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Data Analysis: Findings

Several features are significantly associated with ranked sites

Several pairs of features correlate strongly– Correlations mean different things in

ranked vs. unranked pages Significant features are partially

successful at predicting if site is ranked

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Data Analysis: Significant Differences

Metric Ranked Unranked Ranked Unranked Sig.Word Count 790.5 585.8 1604.5 1315.7 0.150Body Text % 73.7 73.2 22.4 24.5 0.824Emphasized Body Text % 26.1 25 27.2 25.7 0.672Text Positioning Count 4.4 5.4 4.8 11.2 0.244Text Cluster Count 17.9 10.8 22.1 17.4 0.000Link Count 58.8 39.2 56.6 44.2 0.000Page Size (Bytes) 57341.2 39614.9 72024.3 34312 0.001Graphic % 53.6 52.8 27.9 29.3 0.756Graphics Count 25.1 17.5 28.1 22.5 0.002Color Count 8.6 7.4 3.8 3.1 0.001Font Count 4.6 4.6 2.7 2.9 0.836Reading Complexity (GFI) 15.8 19.6 7.8 21.1 0.014

Mean Standard Deviation

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Data Analysis: Significant Differences

Ranked pages– More text clustering (facilitates scanning)– More links (facilitate info-seeking)– More bytes (more content facilitate info

seeking)– More images (clustering graphics facilitates

scanning)– More colors (facilitates scanning)– Lower reading complexity (close to best numbers

in Spool study facilitates scanning)

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Data Analysis: Metric Correlations

Emp. Body T. Cluster Link Color Emp. Body T. Cluster Link ColorMetric Text% Count Count Count Text% Count Count CountLink Count -0.008 0.516 - 0.201 -0.077 0.548 - 0.540Graphics Count -0.040 0.370 0.305 0.331 -0.102 0.445 0.525 0.344Color Count -0.200 0.447 0.201 - 0.013 0.610 0.540 -Font Count -0.083 0.315 0.091 0.642 0.043 0.321 0.366 0.551

Ranked Unranked

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Data Analysis: Metric Correlations

Hypotheses based on correlations:– Ranked Pages

• Colored display text• Link clustering Both patterns on all pages in random sample

– Unranked Pages• Display text coloring plus body text emphasis or

clustering• Link coloring or clustering• Image links, simulated image maps, bulleted links At least 2 patterns in 70% of random sample

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Data Analysis: Example Pages

Ranked Example Unranked Example

Metric Example Mean Std. Dev. Example Mean Std. Dev.Emphasized Body Text % 7.2 26.1 27.2 46.7 25 25.7Text Cluster Count 17 17.9 22.1 11 10.8 17.4Link Count 59 58.8 56.6 24 39.2 44.2Graphics Count 4 25.1 28.1 15 17.5 22.5Color Count 10 8.6 3.8 6 7.4 3.1Font Count 7 4.6 2.7 12 4.6 2.9

Ranked Unranked

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Predicting Web Page Ranking

Linear Regression– Explains 10% of difference between groups– 63% Accuracy (better at unranked prediction)

Employ machine learning techniques

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

Future Work

New metrics computation tool– More quantitative measures– Process style sheets– Functional categories for pages– UI

Repeat data collection and analysis– Larger sample of pages

Validation studies with users More info: www.cs.berkeley.edu

/~ivory/research/web/

Quantitative Measures for Distinguishing Web Pages

HFWEB June 19, 2000

In Summary

Quantitative measures should be helpful for improving information-centric Web pages– We can empirically distinguish between

ranked and unranked pages