Chiou and Tucker News

26
News, Copyright, and Online Aggregators Preliminary Lesley Chiou * and Catherine E. Tucker †‡ October 1, 2010 Abstract This paper examines how the practices of online aggregators of news content affect consumers’ search of news online. The aggregation of content by third-party websites has become a controversial digital copyright issue. On the one hand, aggregators argue that they provide a useful service to consumers and that the amount and substantiality of the news article used in their links is small enough to be considered fair use. On the other hand, content providers argue that such practices represent copyright in- fringement and that third-party aggregators’ profits from advertising associated with content reduces the potential market for or value of copyrighted material. We empir- ically examine how the severing of a relationship between a major content provider and a major news aggregator affected consumers’ search for online news. Specifically, we investigate how the removal of all hosted articles by The Associated Press from Google News at the end of 2009 (due to a dispute in licensing negotiations) affected which news sites consumers visited. Our empirical analysis suggests that the removal of The Associated Press’s content was correlated with a decline in subsequent visits to traditional news sites immediately after visiting Google News compared to other news aggregators that continued to host Associated Press content. * Economics Department, Occidental College, CA MIT Sloan School of Management, MIT, Cambridge, MA. We thank Christopher Hafer of Experian Hitwise. 1

description

psychology

Transcript of Chiou and Tucker News

  • News, Copyright, and Online AggregatorsPreliminary

    Lesley Chiou and Catherine E. Tucker

    October 1, 2010

    Abstract

    This paper examines how the practices of online aggregators of news content affectconsumers search of news online. The aggregation of content by third-party websiteshas become a controversial digital copyright issue. On the one hand, aggregators arguethat they provide a useful service to consumers and that the amount and substantialityof the news article used in their links is small enough to be considered fair use. Onthe other hand, content providers argue that such practices represent copyright in-fringement and that third-party aggregators profits from advertising associated withcontent reduces the potential market for or value of copyrighted material. We empir-ically examine how the severing of a relationship between a major content providerand a major news aggregator affected consumers search for online news. Specifically,we investigate how the removal of all hosted articles by The Associated Press fromGoogle News at the end of 2009 (due to a dispute in licensing negotiations) affectedwhich news sites consumers visited. Our empirical analysis suggests that the removalof The Associated Presss content was correlated with a decline in subsequent visits totraditional news sites immediately after visiting Google News compared to other newsaggregators that continued to host Associated Press content.

    Economics Department, Occidental College, CAMIT Sloan School of Management, MIT, Cambridge, MA.We thank Christopher Hafer of Experian Hitwise.

    1

  • 1 Introduction

    The Internet has reshaped the news and media industry by providing a wealth of information

    and easy access to news, stories, and events that occur locally and globally. Over 74 million

    users visit newspapers sites each month, accounting for more than a third of all web users

    (Advertising Age, 2009). With the proliferation of search engines and news sites, consumers

    face an array of options and sources for information. Search engines have responded to

    consumers demand for information by creating online news aggregators, such as Google

    News and Yahoo! News. These news aggregators feature a collection of stories and headlines

    from various online news sources, which are collated into a single site. As such, aggregators

    offer a convenient place for users to consolidate their news reading.

    We study empirically whether using a news aggregator shifts consumers consumption of

    news online and whether news aggregators are substitutes or complements for the primary

    sources they feature. Surprisingly, there has been no empirical work that quantifies the

    effect of aggregators on primary news sources, even though the US Copyright Office requires

    proof of fair use and asks users to determine whether their use of the copyrighted material

    reduces its potential market or value. On the one hand, news aggregators may steal

    traffic from media sites if users rely solely on the aggregators abbreviated descriptions of

    the article and do not visit the source site. In fact, this accusation has been levied by several

    major media organizations, including The Associated Press and News Corporation. Rupert

    Murdoch, chairman of News Corp., has accused news aggregators of stealing content and

    violating copyrights (Sandoval, 2009). Mark Cuban, chairman of HDNet, even referred to

    such practices as vampiric, saying that Newspapers are getting their blood sucked by Google

    and content aggregators (Kaplan, 2010). Moreover, some anecdotal evidence exists that a

    significant fraction of readers scan only the headlines from Google News and do not visit

    the source site (Sullivan, 2010). On the other hand, news aggregators may expose readers

    2

  • to news sources and sites that they might otherwise not visit, and therefore generate traffic

    for the primary news source. As argued by Arrington (2010), When an aggregator puts up

    a link to your site, they are doing you a favor by sending you traffic.

    Therefore, the overall effect of news aggregators on consumers news-seeking habits is

    an empirical question. Our paper focuses on how the practices of news aggregators affect

    consumers search for news online. We use a rich dataset of consumers online search behavior

    and site visits to examine the effect of a policy change in content displayed by a major

    online news aggregator in displaying content from a primary news. The decision to host

    news sources may be correlated with other factors that influence a consumers consumption

    of news, so we use a discontinuous event that altered the set of news sources provided by a

    major news aggregator.

    In January 2010, after a breakdown in licensing negotiations, Google removed all news

    articles by The Associated Press from its news aggregator (Haddad, 2010). We compare

    consumers site visits before and after this policy relative to traffic from Yahoo! News, which

    continued to provide Associated Press content in this period. Yahoo! News and Google News

    play a large role in the online news market and are among the top 5 news sites visited by

    readers. We find that after Associated Press content was removed from Google News, fewer

    consumers subsequently visited traditional news sites (the sources for much of Associated

    Press content) relative to consumers using Yahoo! News. We checked the robustness of the

    result in a variety of ways. Our finding suggests that the aggregation of news content actually

    complements the original content. In other words, users are more likely to be provoked to

    seek the original source and read further when they come across a story summarized by an

    aggregator, rather than being merely content with the summary.

    Our paper builds on a growing literature that documents how the Internet has affected

    the consumption of traditional news media. Gentzkow (2007) investigated the relationship

    between oine and online newspapers. He found some evidence of complementarity but ulti-

    3

  • mately concluded that this was an artifact of customer heterogeneity and that the provision

    of online news was costly to newspaper print edition revenues. This finding was supported

    in separate research by George (2008) and Filistrucchi (2005). Kaiser and Kongsted (2005)

    find evidence of complementarity for online magazines, which suggests that substitution is

    particularly important for news content rather than more magazine-type content. To our

    knowledge, however, we are the first to study how the practice of online aggregation affects

    online news consumption. Our distinction between traditional media, which is primarily

    local, and news aggregators that have national reach builds on earlier research that has

    documented the importance of the interaction between national and local news distribution

    practices for understanding consumption of news (George and Waldfogel, 2006; Oberholzer-

    Gee and Waldfogel, 2009). Our work also relates to research that has evaluated the conflict

    between digitization and copyright. These studies have focused predominantly on the issues

    relating to the piracy of film and musical content (Rob and Waldfogel, 2006; Oberholzer-Gee

    and Strumpf, 2007; Danaher et al., 2010) by unauthorized distribution channels.

    Our paper has several implications for media markets and public policy. First, a fierce

    debate exists over intellectual property and copyrights for content posted online. Search en-

    gines and aggregators accumulate information from primary sources, and controversy exists

    over whether news aggregators violate existing copyrights and whether content can be pro-

    vided freely by a third party. Secondly, policymakers have long stressed the importance of

    the diversity of news consumption and how consumption of local news in particular encour-

    ages civic engagement. Our results suggest that news aggregators actually provoke readers

    to seek further news. Given that local news sites comprise a substantial fraction of online

    news, aggregators may promote more diversity in consumption patterns.

    4

  • 2 Data and Institutional Setting

    2.1 Contractual Dispute between Google and Associated Press

    The Associated Press, founded in 1846, is one of the most powerful news agencies in the

    world. Since the demise of United Press International, it is the only national news service

    in the US, and its major competitors are now the United Kingdom-based Reuters and the

    France-based Agence-France Presse. It is a cooperative that is owned by various newspapers

    and radio and television stations in the United States. These stakeholders both contribute

    stories to the Associated Press and use material which are written by its staff journalists.

    During the past decade, The Associated Press has been at the forefront of efforts by copyright

    holders to circumscribe fair use for digital content and protect copyholders rights. For

    example, in June 2008, Associated Press has invoked the Digital Millennium Copyright Act

    and insisted that various bloggers remove Associated Press content (Ardia, 2008).

    Google News is ranked as the fifth most visited news website by Hitwise. Receiving 2.90%

    of all news site visits, it is the second most popular news aggregator service after Yahoo!

    News, which received 7.09% of all news site visits. Founded on April 2002, Google News

    electronically aggregates different news sources based upon a proprietary algorithm. As of

    December 2009, Google News claimed that it received news content from 25,000 publishers

    across the world and that it sent 1 billion clicks to these publishers every month (Cohen,

    2009). Google News has been supported by advertising revenues in the US since February

    2009. Figure 1 provides a screenshot of Google News. Google News has two noticeable

    features that distinguishes it from traditional news sites. First, a variety of sources are

    listed for each story. Second, the order of news is electronically determined based on users

    preferences, the recency of the story, and the interest it has received from other users.

    Since both The Associated Press and Google News are key players in the distribution

    of news, it is not surprising that they have forged a partnership. Table 1 summarizes the

    5

  • Figure 1: Screen shot of Google News screen

    Note: On June, 30 2010, the formatting of Google News changed somewhat and reduced the ability of users tocustomize the placement of the columns containing news. Therefore the screenshot above, which was producedafter this formatting change, may be slightly different from what users viewed during the period that we study.

    6

  • major events of their relationship. We study a discontinuity in this relationship, which was

    engendered by negotiations surrounding the contract renewal at the end of January 2010. As

    part of their existing contract, Google and The Associated Press agreed that Associated Press

    content could be hosted by Google for a period of 30 days. Therefore, if the contract ended

    in January 2010 and was not renewed, Google would have to stop posting new Associated

    Press content 30 days prior to the end of the contract. Presumably, to make this clean

    break a credible outside option, Google did indeed stop posting content for seven weeks

    during these contract negotiations. We should emphasize that this is necessarily based on

    the observations of industry outsiders, since both Google and the Associated Press signed

    binding non-disclosure agreements, which prevented them from ever commenting on the

    course or outcome of negotiations (Sullivan, 2010).

    This removal of Associated Press content represents a useful natural experiment for

    empirical researchers. Since the removal of content was provoked by the intricacies of contract

    negotiation, its timing can be thought of reasonably exogenous, as it was determined by the

    expiration of the contract rather than any considerations of the popularity (or lack thereof)

    for Associated Press content at that time. As detailed in Table 1, the dispute with the

    Associated Press led Google to remove content by the Associated Press from December 23,

    2009 to February 9, 2010. Fortunately for our purposes, Yahoo! News continued to host

    Associated Press content without interruption during this time, which enables us to use its

    web users behavior as a control in our regressions. We compare which websites consumers

    navigated to after visiting a news aggregator before and after the removal of content on

    Google for both visitors to Google News and Yahoo! News.

    Critics and supporters alike of news aggregators have proposed numerous arguments for

    whether the removal of Associated Press content may either benefit or hurt news websites.

    On the one hand, if consumers are no longer able to obtain Associated Press news content,

    they may be more likely to seek the news directly from the Associated Press member orga-

    7

  • Table 1: Timeline of negotations between Google and Associated PressDate Event

    March, 2005 Google is sued by Agence France Presse for copyright in-fringement after AFP content appeared on Google News.

    August, 2006 Google and Associated Press first sign contract to enableAssociated Press content to appear on Google News for30 day window.

    December 24, 2009 Associated Press content no longer appears on Google.Industry press speculates that this is in preparation forthe expiration of contract between Associated Press andGoogle in a months time.

    End January 2009 Associated Press and Google contract set to expire

    February 2010 Associated Press Content returns to Google News.

    nizations and newspapers. On the other hand, consumers may simply be less likely to seek

    further information about news. In essence, this distinction can be boiled down to whether

    consumers view news aggregators as a complement or substitute to original news sources. Do

    they use news aggregators to identify news stories that they then pursue in greater depth,

    or do they simply stop after reading the first news item? For instance, the Associated Press

    ran a news story about economic depression in Michigan in August 2010. The screenshot

    of how the story appeared on Google News is depicted in Figure 2. The links relating to

    the Associated Press story that appear at the bottom of a typical story are also depicted

    in Figure 2. After reading the Associated Press summary of the story, readers are free to

    explore the issue further in local newspapers such as the Detroit News and Lansing State

    Journal. The question we ask is whether the presence of the Associated Press content on

    Google News makes it more or less likely that a news consumer would then trouble to visit

    Detroit News or the Lansing State Journal, both of which are members of the Associated

    Press Network.

    Our analysis is focused on the period immediately prior to and during the removal of

    8

  • Figure 2: Example screenshot of Associated Press article hosted on Google NewsNote: Google News, August 1st 2010. Text of article has been slightly edited to fit on page.

    9

  • Associated Press articles from Google News for two reasons. First, it is not immediately clear

    at which point in February that Google News and Associated Press resumed their relationship

    and reached a new agreement. Second, it is not apparent whether the reinstatement during

    this time consisted of the older, missing content or new content or whether Google changed

    the presentation of AP articles afterwards. For example, it would be problematic if Google

    decided to highlight Associated Press content after the contract negotiations were concluded,

    perhaps as a sweetener to the deal. For these reasons, we focus on visits to news sites

    during the months of December 2009 and January 2010.

    2.2 Data Description

    Our data derive from Experian Hitwise. Hitwise develops proprietary software that Internet

    Service Providers (ISPs) use to analyze website logs created on their network. Once the ISP

    aggregates the anonymous data, the data are provided to Hitwise. According to their website,

    Hitwise collects these usage data from a geographically diverse range of ISP networks and

    opt-in panels, representing all types of Internet usage, including home, work, education

    and public access. Currently, Hitwise has usage data from a sample of 25 million people

    worldwide.

    We collected information on the sites that users visit immediately after navigating to

    Google News or Yahoo! News. We use weekly data from the week ending December 5, 2009

    to the week ending February 27, 2010 for the top 1500 sites navigated after Google News or

    Yahoo! News. Hitwise reports the fraction of total traffic that arrives at each downstream

    site immediately after a visit to Google News and Yahoo! News. We constructed a 2-

    month panel where the unit of observation is the percentage of weekly clicks a downstream

    website received from either Google News or Yahoo! News. Twenty-six percent of websites

    received incoming traffic from both Google and Yahoo! News. The remainder of websites

    were only visited after navigating to one particular aggregator. This may reflect internal

    10

  • complementarities for these companies. For instance, someone using Google News is unlikely

    to navigate to Yahoo! Mail, and similarly someone using Yahoo! News is unlikely to navigate

    to Gmail.

    We categorized the websites into two main classes: non-news (e.g., Yahoo! Mail, mys-

    pace.com) and traditional news (e.g., newyorktimes.com, bostonherald.com). We applied

    Hitwises own categorization of news websites to identify traditional news media, but we

    excluded weather sites and news aggregators from the 5 major search engines (such as Ya-

    hoo! News, Google News, Huffington Post) from the category.1 We identified a site as an

    aggregator based upon whether or not they produced their own original content.

    We also constructed a separate category for international news (e.g., bbc.com/news,

    hindustantimes.com), which we use in our robustness checks. We would expect the removal

    of Associated Press content to affect traditional news media sites, but the removal should

    not affect visits to international sites that tend to either generate their own content or rely

    on non-American news agencies for their content.

    Table 2 reports the summary statistics for our data. It is striking that 20 percent of

    the time, consumers navigate to a traditional news media website from the news aggregator.

    Traditional news sites captured most traffic. International news received less traffic (5.5

    percent of sites visited) than traditional news sites.

    1Hitwise reports the top 10,000 ranked news and media sites in November 2009.

    11

  • Table 2: Summary statistics for downstream websites from Google News and Yahoo! News

    Mean Std Dev Min Max Observations% clicks 0.016 0.19 0 18.3 100503Google News 0.50 0.50 0 1 100503PeriodDispute 0.67 0.47 0 1 100503Traditional News Site 0.20 0.40 0 1 100503News Aggregator Site 0.0011 0.033 0 1 100503International News Site 0.055 0.23 0 1 100503Observations 100503

    Notes: This table reports statistics for websites visited immediately after Google News and Yahoo! Newsduring December 2009 and January 2010. The period during which the dispute occurred between AssociatedPress and Google News was after December 23, 2009. Traditional news sites refer to news and media sitesas defined by Hitwise, excluding weather sites, international news sites, and news aggregators from the top5 search engines.

    12

  • Table 3 displays the top 50 (traditional) news websites in our dataset and the average

    percentage of downstream clicks they receive. Table 4 displays the top 50 non-news websites

    in our dataset and the average percentage of downstream clicks they receive. As shown in

    Table 4, the top non-news websites reflect the top website brands on the Internet. This is

    suggestive evidence that users of news aggregator sites have both mainstream Internet tastes

    and regard the sites as part of their normal Internet consumption.

    13

  • Table 3: Top 50 news websites visitedafter Google News and Yahoo! News

    Avg Visit Pctabcnews.com 2.11associatedcontent.com 0.11bleacherreport.com 0.17bloomberg.com 0.51boston.com 0.24bostonherald.com 0.19businessweek.com 0.15cbsnews.com 0.19chron.com 0.13cnn.com 1.85csmonitor.com 0.15dallasnews.com 0.11drudgereport.com 0.64edition.cnn.com 0.20examiner.com 0.65foxnews.com 1.13foxnews.com/entertainment 0.082foxnews.com/politics 0.20freep.com 0.13gather.com 0.34latimes.com 0.48mcclatchydc.com 0.095mercurynews.com 0.44miamiherald.com 0.15msnbc.com 0.83news.com 0.12nj.com 0.11npr.org 0.16nydailynews.com 1.59nypost.com 0.26nytimes.com 2.88people.com 0.39philly.com 0.15politico.com 0.53radaronline.com 0.060reuters.com 0.69seattlep-i.nwsource.com 0.11seattletimes.nwsource.com 0.11sfgate.com 0.17sportsillustrated.cnn.com 0.10startribune.com 0.084thedailybeast.com 0.17theweek.com 0.14time.com 1.16upi.com 0.093usatoday.com 0.72usmagazine.com 0.23usnews.com 0.082voanews.com 0.13washingtonpost.com 1.74wsj.com 0.86

    Table 4: Top 50 Non-news websitesvisited after Google News and Yahoo!News

    Avg Visit Pctaddress.yahoo.com 0.12amazon.com 0.59aol.com 0.46aralifestyle.com 0.14ask.com 0.19bankofamerica.com 0.18bing.com 0.62blogsearch.google.com 0.77buzz.yahoo.com 0.21chase.com 0.14cosmos.bcst.yahoo.com 0.95ebay.com 1.00education.yahoo.net 0.34espn.com 0.56facebook.com 6.23fastflip.googlelabs.com 3.60finance.google.com 0.36finance.yahoo.com 0.60games.yahoo.com 0.099gmail.com 1.55google.com 11.6howlifeworks.com 1.04huffingtonpost.com 0.96images.google.com 0.50latimesblogs.latimes.com 0.16livescience.com 0.38mail.live.com 1.28mail.yahoo.com 9.94maps.google.com 0.23members.yahoo.com 0.29movies.yahoo.com 0.13msn.com 1.03my.yahoo.com 0.67myspace.com 1.54news.google.com 0.24omg.yahoo.com 0.32rivals.com 0.10search.yahoo.com 2.20shine.yahoo.com 0.13space.com 0.15sports.yahoo.com 0.26sports.yahoo.com/nfl 0.13tmz.aol.com 0.20tv.yahoo.com 0.12video.google.com 0.27weather.com 0.67weather.yahoo.com 0.39wikipedia.org 0.50yahoo.com 7.20youtube.com 2.47

    14

  • Table 5: Demographic description of usersMeasure Yahoo! News Google News New York Times

    Male 59.95 63.8 61.21Age 18-24 12.12 13.89 6.17Age 25-34 18.05 14.72 13.93Age 35-44 19.03 17.08 12.98Age 45-54 21.41 22.24 19.45Age 55+ 29.38 32.06 47.47Income 150k 9.29 9.6 10.77

    Source: Hitwise

    Notes: This table reports the fraction of users of a particular website within each demographic category.Statistics are reported for users of Yahoo! News, Google News, and the New York Times website.

    To verify that Yahoo! News could be considered an appropriate control group for Google

    News, we checked that the users shared similar observable demographics. Table 5 reports the

    fraction of users within each demographic category for a particular site. The users of Yahoo!

    News and Google News do indeed look reasonably similar; they are skewed towards being

    older, predominantly male, and wealthier than the general U.S. population. For comparison,

    we also report demographics for users of the New York Times website. The users of the

    New York Times site are similar, though significantly older, than the average users of a news

    aggregator. Table 5 also provides suggestive evidence of why the debate over ad revenues from

    news content is so contentious. Users such as these are a remarkably attractive demographic

    group from an advertisers perspective.

    15

  • 3 Analysis

    Figure 3 summarizes our main analysis. Figure 3 illustrates the mean percentage of down-

    stream traffic for users that visited Google News and Yahoo! News during our period. As

    seen in the graph, little change occurs in downstream site navigation for Yahoo!. However,

    news sites experience a decline in visits from Google News after the removal of Associated

    Press relative to the change in traffic from Yahoo! News.

    Figure 4 extends this analysis to show how visit behavior varies for international news

    sites as well. Once again, little change exists in user behavior for these additional types of

    websites on either Yahoo! News or Google News, suggesting that these sites were not affected

    by the removal of Associated Press content. As expected, these international websites are

    unlikely to be affected by the removal of AP content due to the nature of their content. As

    seen in Figure 5, no such change in clicks occurred in the prior year during the same calendar

    months of December 2008 and January 2009.

    Figure 3: Downstream sites visited after Google News and Yahoo! News

    Notes: This figure shows the average percentage of clicks for news and non-news sites navigated to aftervisiting from Google News and Yahoo! News before and after the removal of The Associated Press fromGoogle News.

    16

  • Figure 4: Downstream sites visited after Google News and Yahoo! News

    Notes: This figure shows the average percentage of clicks for a variety of website types navigated to aftervisiting from Google News and Yahoo! News before and after the removal of The Associated Press fromGoogle News.

    To formalize the insights provided by Figures 3 and 4, we run a difference-in-differences

    regression for the policy change and estimate the following regression for the percentage of

    clicks to website i after visiting news aggregator j in month t:

    %clicksijt = 0 + 1Newsi Googlej PeriodDisputet + 2Newsi PeriodDisputet+ 3Newsi Googlej + 4Googlej+ i + weekt + ijt

    where News is an indicator variable equal to 1 if the website is a traditional news source,

    Google is an indicator variable equal to 1 if the traffic originated after viewing Google News,

    and PeriodDispute is an indicator variable equal to 1 for the weeks after the removal of

    Associated Press from Google News. The controls are downstream-website fixed effects.

    The vector weekt contains weekly fixed effects to capture national variation in the volume

    17

  • Figure 5: Downstream sites visited after Google News and Yahoo! News in prior year(December 2008 and January 2009)

    Notes: This figure shows the average percentage of clicks for news and non-news sites navigated to aftervisiting from Google News and Yahoo! News in December 2008 and January 2009 for the year prior to theremoval of The Associated Press from Google News.

    and interest generated by news stories in that week. The coefficient on the interaction

    term News Google PeriodDispute captures the effect of the Associated Press removalon visits to traditional news sites compared to non-news sites from Google News with the

    corresponding change in traditional news and non-traditional news sites on Yahoo! as a

    control. We estimate this specification using ordinary least squares and cluster our standard

    errors at the website level to avoid the downward bias reported by Bertrand et al. (2004).

    Table 6 reports the results for various regression specifications, incrementally building

    up to our full specification described by equation (1). Very little variation exists in the size

    or precision of our coefficient of interest in each of the columns. The negative coefficient on

    NewsGooglePeriodDispute implies that during the dispute with Associated Press, GoogleNews users were less likely to visit traditional news websites after visiting Google News. This

    suggests that the presence of Associated Press articles in Google News prompted users to

    seek further information at traditional news sites and thereby encouraged more diversity in

    18

  • Table 6: Downstream traffic Google and Yahoo! News before and after the policy change

    (1) (2) (3)% clicks % clicks % clicks

    PeriodDispute X Google X News -0.00583 -0.00583 -0.00583

    (0.00271) (0.00284) (0.00284)PeriodDispute X Google 0.00152 0.00152 0.00152

    (0.00219) (0.00229) (0.00229)PeriodDispute -0.000514 -0.000514 -0.000560

    (0.000655) (0.000686) (0.00105)Google -0.00372 -0.0115 -0.0115

    (0.00367) (0.00617) (0.00617)News -0.000393

    (0.00356)PeriodDispute X News 0.00143 0.00143 0.00143

    (0.000963) (0.00101) (0.00101)News X Google 0.0184 0.0324 0.0324

    (0.00600) (0.00753) (0.00753)Website Fixed Effects Yes Yes YesWeek Fixed Effects No No YesObservations 100503 100503 100503R-Squared 0.000543 0.581 0.581

    Robust standard errors clustered at website level. *p < 0.1, **p < 0.05, ***p < 0.01. Thedependent variable is the fraction of traffic to websites after visiting Google News or Yahoo!News. The policy change is the removal of hosted articles by The Associated Press fromGoogle News.

    news consumption.

    News sites on Google experience a 6 percentage point decrease in clicks after the removal

    of Associated Press articles. Compared to the mean percentage share of 2.9 percent before

    the policy change, this drop represents an approximately 20 percent decrease in traffic to

    news sites after the removal of Associated Press articles from Google. If the claim in Cohen

    (2009) is true that Google sends a billion clicks each month to its partner news providers, then

    this percentage translates into a very large change in the number of clicks that traditional

    news websites receive. While we do not know precisely the international breakdown, our

    data from Hitwise suggest that 40 percent of all clicks before the policy change went to

    19

  • traditional news media websites hosted in the US. Therefore, this 20 percent decrease could

    imply a 80 million decrease in visits each month from Google News users each month to

    traditional news media websites hosted in the US.

    Our results suggest that news aggregators complement the news sources that they fea-

    ture by directing traffic to these news sites. The provision of content on news aggregators

    encourages readers to seek further information from other news sources.

    20

  • 4 Robustness Checks

    We conducted various robustness checks as reported in Table 7. Columns (1) and (2) check

    the robustness of our results to alternative specifications. We apply a Tobit regression to

    account for sites that receive zero clicks in a given week and also a semi-log regression.2 Both

    regressions have similar signs for the coefficients of interest; news sites receive less traffic from

    Google after the policy change.

    Columns (3)-(5) check robustness of the results to alternative definitions of the con-

    trol group. As described previously, users navigated to a variety of non-traditional news

    sites after visiting a news aggregator. These sites included both non-traditional and non-

    Associated Press sources of news. In columns (3) and (4), our robustness checks omit the

    top news aggregators and international websites as part of the control group. These alter-

    native definitions of the control group could be warranted if the removal of Associated Press

    content also affected navigation to these sites directly (e.g., if Associated Press content had

    previously encouraged people to visit international websites, or if the removal of Associated

    Press content on Google altered peoples perceptions of news aggregators.) In column (5),

    we check robustness to removing both news aggregators and international sites from our

    data. Generally, the results are robust in sign.

    2For the semi-log regression, we use log(%clicks+0.01) as the dependent variable.

    21

  • Tab

    le7:

    Rob

    ust

    nes

    sch

    ecks:

    Dow

    nst

    ream

    traffi

    cto

    loca

    lnew

    ssi

    tes

    from

    Goog

    leN

    ews

    and

    Yah

    oo!

    New

    sb

    efor

    ean

    daf

    ter

    the

    pol

    icy

    chan

    ge

    (1)

    (2)

    (3)

    (4)

    (5)

    Tob

    itS

    emi-

    log

    No

    Agg

    rega

    tors

    No

    Inte

    rnat

    ion

    alN

    ews

    vs

    Non

    -New

    s

    Per

    iod

    Dis

    pu

    teX

    Goog

    leX

    All

    New

    s-0

    .022

    9

    -0.0

    216

    -0.0

    0583

    -0

    .006

    20

    -0.0

    0632

    (0.0

    0924

    )(0

    .012

    7)(0

    .002

    84)

    (0.0

    0303

    )(0

    .00306)

    Per

    iod

    Dis

    pu

    teX

    Goog

    le0.

    0039

    4-0

    .007

    830.

    0015

    20.

    0017

    80.0

    0179

    (0.0

    0557

    )(0

    .005

    72)

    (0.0

    0230

    )(0

    .002

    49)

    (0.0

    0251)

    Per

    iod

    Dis

    pu

    te-0

    .005

    56-0

    .006

    85-0

    .000

    464

    -0.0

    0076

    2-0

    .000763

    (0.0

    0446

    )(0

    .005

    27)

    (0.0

    0108

    )(0

    .001

    11)

    (0.0

    0112)

    Goog

    le0.

    0248

    -0.0

    207

    -0.0

    114

    -0.0

    153

    -0

    .0155

    (0.0

    0874

    )(0

    .012

    3)(0

    .006

    19)

    (0.0

    0674

    )(0

    .00680)

    All

    New

    s0.

    112

    (0.0

    208)

    Per

    iod

    Dis

    pu

    teX

    All

    New

    s0.

    0152

    0.01

    55

    0.00

    142

    0.00

    148

    0.0

    0154

    (0.0

    0564

    )(0

    .008

    76)

    (0.0

    0101

    )(0

    .001

    04)

    (0.0

    0105)

    All

    New

    sX

    Goog

    le-0

    .012

    60.

    0785

    0.03

    23

    0.03

    64

    0.0

    362

    (0.0

    142)

    (0.0

    232)

    (0.0

    0754

    )(0

    .008

    04)

    (0.0

    0812)

    Web

    site

    Fix

    edE

    ffec

    tsN

    oY

    esY

    esY

    esY

    esW

    eek

    Fix

    edE

    ffec

    tsY

    esY

    esY

    esY

    esY

    es

    Ob

    serv

    atio

    ns

    1005

    0310

    0503

    1003

    9594

    959

    94203

    R-S

    qu

    ared

    0.68

    40.

    580

    0.58

    10.5

    81

    Rob

    ust

    stan

    dar

    der

    rors

    clust

    ered

    atw

    ebsi

    tele

    vel.

    *p