Advertising, Searching, Blogging and New-Product Sales
Transcript of Advertising, Searching, Blogging and New-Product Sales
1
Advertising, Searching, Blogging and New-Product Sales
Ho Kim and Dominique M. Hanssens
August 2013
Ho Kim is Assistant Professor of Marketing, School of Business and Management, Azusa Pacific University, 901
E. Alosta Ave. Azusa, CA 91702 (e-mail: [email protected]). Dominique M. Hanssens is the Bud Knapp Professor of
Marketing, UCLA Anderson School of Management, 110 Westwood Plaza, Los Angeles, CA 90095 (Email:
[email protected], Phone: (1) 310-825-4497, Fax: (1) 310-206-7422).
2
Advertising, Searching, Blogging and New-Product Sales
ABSTRACT
This paper presents new findings about how advertising and online word-of-mouth influence product sales
by examining the relationship between advertising, blog volume, search volume, and revenue of theatrically
released movies. To study the relationship over three different phases of a movie’s life cycle—i.e., pre-
launch period, opening week and post-launch period, we develop three separate models and estimate them
by correcting for endogeneity. We find that the majority of advertising effect on revenue is realized without
consumers’ online search activity, while the effect of blogs on revenue is materialized only indirectly,
through online search. Moreover, advertising is about three times more effective than blog volume in
generating movie revenue. Other findings include, but not limited to the followings. Search volume has
better predictive power than blog volume: Pre-launch blog volume does not explain the variation in
opening-week revenue once pre-launch search volume is controlled for; after opening week, weekly blog
volume does not explain variation in weekly revenue once weekly search volume is included. In pre-launch
period, consumers respond to advertising by online searching and blogging, but searching is five times
more responsive than blogging. Our findings indicate the importance of monitoring online search activity
of consumers.
Keywords: advertising, online search volume, blog volume, revenue prediction, endogeneity, instrumental
variables.
3
1. Introduction
The sales of an entertainment product depend on advertising, word-of-mouth, and search
activity around the product, among other things. Advertising is an effective tool to lift sales of an
entertainment product. For example, the advertising elasticity of a movie is over 0.2 (Elberse and
Eliashberg 2003), which is twice higher than that of average sales-to-advertising elasticity
(Hanssens 2009). Online word-of-mouth (WOM) is also an important factor due to due the
experiential nature of entertainment products. Previous research finds that volume and valence of
online WOM influence the success of an entertainment product (Chevalier and Mayzlin 2006,
Chintagunta et al. 2010, Dhar and Chang 2009, Duan et al. 2008ab, Gopinath et al. 2013, Liu
2006, Onishi and Manchanda 2012). Finally, to the extent that the online WOM matters, their
sales will be dependent on online search activity of consumers. This is because online WOM of a
consumer would influence the purchase behavior of others only if the WOM is reached to other
consumers, and because online search is an effective way of finding WOM on the Internet.
The three influencers, however, may not be equally important to managers for the following
reasons. First, advertising is delivered on various media while online WOM is carried only on
the Interne. Thus, online WOM is more limited than advertising in reaching consumers. Second,
online search activity is substantially more prevalent than online WOM activity. According to a
recent study, 32% of the U.S. adult Internet users read someone else’s online journal or blogs
and 32% of the Internet users posted a comment or review online about products they bought,
while 91% of the U.S. adult Internet user employed a search engine to find information (The Pew
Research Center 2012). These statistics suggest that online search data may be more useful than
online WOM data to predict market outcomes such as product sales. Third, market outcomes are
influenced by not WOM generated but WOM consumed (Onishi and Manchanda 2012). This is
4
because online WOM of a consumer would be able to influence the purchase behavior of others
only if the WOM is reached to other consumers. Therefore, consumers search intensity for a
product, or online search volume, should play a role in the relationship between the sales and
online WOM volume.
Thus, the following questions arise: (i) How does online WOM differ from advertising in
generating sales of a new entertainment product? (ii) How does online search activity of
consumers act in the revenue generation process of WOM and advertising? (iii) Between the
WOM volume and search volume, which is more important to accurately predict the sales of an
entertainment product? (iii) Finally, how are the focal variables related to each other before and
after product launch?
To answer the questions, we develop empirical models of advertising, online WOM, online
search, and revenue, and apply them to a weekly data set of 153 theatrically released movies.
The weekly data set spans from 30 weeks before each movie’s release until 10 weeks after the
release. Weekly blog volume is used to measure online WOM activity. We prefer blogs to user
reviews because consumers post blogs even long before a movie’s launch, while user reviews are
available only after the launch. The weekly Google keyword search index is used to measure
online search activity.
Three separate periods exists in the life of a theatrically released movie: pre-launch period,
opening week, and post-launch period. Thus we develop three separate models corresponding to
each period. The pre-launch period model is a panel data model that examines the relationship
between advertising, blog volume, and online search volume over the pre-launch period. The
post-launch period model is also a panel data model that examines the relationship between
5
advertising, blog volume, online search volume, and movie revenue over the post-launch period.
The opening-week model is a cross-sectional data model that examines the effects of pre-launch
blog and search volume on the opening-week revenue of movies. It is challenging to isolate
causal relationship between the focal variables as they are simultaneously determined. We find
exogenous variables for the weekly blog and search volume of individual movies and use
exclusion restrictions to identify the causal effects. To find the effect of movie consumption on
search and blog volume, we apply a covariance restriction technique.
We find that majority of advertising effect on revenue is realized without consumers’ online
search activity, while the effect of blogs on revenue is materialized only indirectly, through
online search. Moreover, advertising is about three times more effective than blog volume in
generating movie revenue. Other findings include, but not limited to the followings. Search
volume has better predictive power than blog volume: Pre-launch blog volume does not explain
the variation in opening-week revenue once pre-launch search volume is controlled for; after
opening week, weekly blog volume does not explain variation in weekly revenue once weekly
search volume is included. In pre-launch period, consumers respond to advertising by online
searching and blogging, but searching is five times more responsive than blogging. There is
concurrent feedback from movie consumption to online searching and blogging in a week,
implying a virtuous cycle. Our findings indicate the importance of monitoring online search
activity of consumers.
The remainder of the study is organized as follows. We first summarize previous research
and describe our movie data set. Next, we develop three models to answer our research
questions, apply them to our movie data set and discuss the findings and managerial
implications. We then formulate conclusions and areas for future research. The appendix and
6
web appendix explain, respectively, how to identify the parameters and how to transform the
Google search index to cross-sectionally comparable search volume metric.
2. Relevant Research
Previous studies find that both volume and valence of WOM influence product sales. Liu (2006)
and Duan et al. (2008ab) find that WOM volume offers significant explanatory power for
aggregate and weekly box-office revenue. Dhar and Chang (2009) show that future sales of
music albums are positively correlated with the volume of blog posts about the albums.
Chintagunta et al. (2010) find that valence captured by the average user rating explains
designated market area (DMA)-level opening-day box-office revenues. Gopinath et al. (2013)
find that the volume of blogs has significant effects on opening-week movie sales, but after
release the valence of blogs predict movie sales. Finally, Chevalier and Mayzlin (2006) find that
an improvement in book ratings leads to an increase in relative sales.
Others have examined how commercial media and social media influence market outcomes.
Onishi and Manchanda (2012) find that advertising and blogging are synergistic, and cumulative
blogs are predictive of market outcomes. Villanueva et al. (2008) find that customers acquired
through WOM add more long-term value to a firm than customers acquired through advertising.
Trusov et al. (2009) find that WOM referrals have substantially longer carryover effects than
traditional marketing actions in acquiring new members at an Internet social networking site.
When it comes to the antecedents of online WOM, previous literature finds that advertising,
the volume of the previous period’s WOM, and current and past market outcomes are positively
correlated with the volume of the current period’s online reviews and blog postings (Duan et al.
2008ab, Liu 2006, Onishi and Manchanda 2012). Yang et al. (2013) find that consumers’
7
propensity to generate WOM is positively correlated with their media exposure whereas their
propensity of consuming WOM is mixed.
In contrast to ample research on online WOM, few studies are done to examine how
consumer online search is associated with other managerial variables. Two exceptional studies
are Joo et al. (2012), who examine the significant effect of TV advertising on consumers’ online
search activity, and Kulkarni et al. (2012), who study the predictive power of online search
volume. But, none of the two studies examine online search in a setting that includes
advertising, online WOM and market outcome altogether.
To summarize, prior empirical research has examined the relationship between advertising,
online WOM and market outcomes, but such studies have not considered how consumers’ online
search interacts with the other variables. Therefore, the managerial implications of online search
are not well appreciated. Also, the changing relationship between the variables before and after a
new-product’s release has received little attention. By examining the relationship between
advertising, blog volume, online search volume, and revenue in the U.S. motion picture industry,
we aim to provide new insights for movie studios and entertainment product managers. Table 1
summarizes the previous studies mentioned above to help understand the contribution of this
study.== Table 1 about here ==
3. The Data
Our database consists of 153 movies, most of which were released in 2009. For each of the 153
movies, we collect weekly advertising spending, blog volume, and search volume, from 30
weeks before its release through 10 weeks after the release. Additionally, weekly screen numbers
and revenue are collected from the opening week through week 10 after release. For use as
8
instruments in estimating our models, we also collect various movie characteristics, weekly
Google search indices of the keyword “opening movie,” and weekly traffic to the five most
popular blog sites—i.e., blogger.com, tumblr.com, wordpress.com, squarespace.com, and
posterous.com.
3.1. Advertising and Box Office Revenue
Our advertising data cover the major media outlets such as television, print, radio, and outdoor
expenditure as collected by Nielsen. The average advertising spending of the 153 movies during
the analysis period—i.e., from 30 weeks before release to 10 weeks after release—is $21 million,
with 80% of the advertising budget spent in the pre-launch periods or during the release week.
Box-office revenue is collected from The Numbers (www.the-numbers.com). One hundred and
thirty-nine movies were exhibited for at least 10 weeks, and the shortest movie run was five
weeks. The median U.S. box-office revenue in our sample is $44 million. Figure 1(a) shows
weekly advertising spending and box-office revenue averaged across the 153 movies.
3.2. Blog Postings and Search Volume
Weekly blog postings for each movie were collected from the Google blog search engine
(www.google.com/blogsearch). To minimize noise in the data collection process, we constrained
our search to blogs whose titles contain either the word movie or film. Our general rule of
constructing search keywords for blog postings is <movie title> + “movie.”1 For example, to find
blog postings of the movie Avatar, we used the keyword “Avatar movie.” For movies with long
titles, reduced search keywords were used. For instance, to search for blog postings of the movie
Bad Lieutenant: Port of Call New Orleans, we searched for the postings that contain the
1 We developed several different versions of search keywords and found that <movie title> + “movie” is the most
appropriate keyword to collect blog postings of the focal movie. This judgment is based on the empirical fit between
the movie’s weekly advertising, launch schedule, and the collected blog volumes.
9
keyword “Bad Lieutenant” in their title. For each week of each movie, we repeated the search
trials five times and used the mode of the number of blog postings so gathered.2
For weekly search volume of movies, we relied on Google Trends. Google Trends shows the
weekly search index of the entered keyword. The raw index provided by Google is normalized to
conceal the actual search volume of the keyword entered into the Google search engine. This
normalization causes a problem when comparing search volumes of different movies. We
propose a method that transforms the raw search index of Google to a cross-sectionally
comparable search volume measure. The detailed methodology of collecting weekly Google
search indices of movie keywords and transforming them into cross-sectionally comparable
measures can be found in the web appendix. Figures 1(b) and 1(c) show how the average blog
volume, search volume, and revenue change on a weekly basis over the movie lifecycle.
== Figure 1 about here ==
3.3. Other Variables
We collect the weekly Google search index of the keyword “opening movie,” weekly traffic to
the five most popular blog sites, and various movie characteristics. They are used as instruments
in estimating our models.
The weekly search index of the keyword “opening movie” was collected from Google
Trends. Using the search filters of Google Trends, we consider only the search activities related
to the U.S. movie industry. The weekly traffic of the five most popular blog sites —i.e.,
2 In most cases, the five search trials give a unanimous result. That is, the Google blog search engine gives the same
number of blog postings for the same search condition. In some rare cases, the five search trials disagree—with one
outlier and four same numbers. In these cases, we use the four same numbers (i.e., the mode) for the number of blog
postings.
10
blogger.com, tumblr.com, wordpress.com, squarespace.com, and posterous.com—was collected
from Alexa.com, a subsidiary of Amazon.com. Alexa collects daily traffic information of
websites based on a global panel of toolbar users. The panel consists of millions of people using
toolbars created by over 25,000 publishers, including Alexa and Amazon. We aggregate the daily
reach of the five blog sites at the weekly level to derive the weekly total reach of the blog sites.
Figure 2 illustrates the weekly search index of the keyword “opening movie” and the weekly
reach of the five blog sites, from the first week of 2008 to the last week of 2009.
== Figure 2 about here ==
We also collect various movie characteristics. They include genre, MPAA rating, monthly
seasonality, whether the movie is a sequel, average critic rating from Metacritic
(www.metacritic.com), and director power. For the director power, we collect three variables:
total revenue of past movies with which the focal movie’s director was involved as either a
director, writer, or producer, since 1990 up to one calendar year before the focal movie’s release;
average user rating of such movies; and the standard deviation of user ratings of such movies.
Table 2 summarizes the variables and their sources and Table 3 shows the descriptive statistics
of the main variables.
== Table 2 about here ==
== Table 3 about here ==
4. Modeling
We develop three separate models—the pre-launch period model, opening-week model and post-
launch period model— to deal with the three different phases that a theatrically-released movie
goes through. The pre-launch period model examines the relationships between advertising, blog
volume, and search volume of a movie until one week before its release. The model consists of
11
two equations whose dependent variables are weekly blog volume (blog equation) and weekly
search volume (search equation). The post-launch period model examines the relationships
between advertising, blog volume, search volume, and revenue of a movie after its opening
week. This model consists of three equations whose dependent variables are weekly blog volume
(blog equation), weekly search volume (search equation), and weekly revenue (revenue
equation). The opening-week model examines the effect of pre-launch search volume and pre-
launch blog volume on opening-week revenue after controlling for the effects of other relevant
variables. We distinguish the post-launch period model from the opening-week model as
opening-week of a movie has a distinctive significance to studio managers. Note that the
opening-week model is necessarily a cross-sectional data model while the pre- and post-launch
period models are panel data models.
We first discuss the two panel data models, whose specifications are based on the following
considerations.
First, we assume that a weekly variation in advertising spending is exogenous based on
the following institutional feature of movie advertising (Elberse and Anand 2007, Onishi and
Manchanda 2012). Movie studios typically purchase the vast majority of advertising in the
upfront market. This makes it difficult for a movie studio to buy additional advertising based on
new information. Additional purchase of advertising in the opportunistic market, if any, is
affected by exogenous events, which implies week-to-week variation in advertising spending of
a movie is mainly driven by exogenous factors.
Second, we explain the weekly variation in each endogenous variable with the weekly
variation in advertising and other endogenous variables. For example, we explain the weekly
12
search volume of a movie in the post-launch period with weekly advertising, blog volume and
revenue. We will discuss the lag-length determination.
Third, we do not include lagged dependent variables to explain the weekly variation in
dependent variables. For instance, we exclude lagged search volume to explain the variation in
current week’s search volume. Instead, we include a sufficient number of lags of advertising and
the other endogenous variables, wherever possible. This is a preferred approach in previous
studies (Basuroy et al. 2006, Elberse and Eliashberg 2003, Liu 2006) because it avoids
endogeneity caused by the possible correlation between the lagged dependent variables and the
error term of the equation.
Fourth, we include the time-specific effect but not the individual movie-specific effect in
each equation. In the panel data model, it is usually recommended to control for unobserved
individual or time effects. However, when lagged endogenous variables are included as
covariates—as in our models, controlling for individual-specific effects leads to biased
estimators (Baltagi 1995, Elberse and Eliashberg 2003). An alternative is to include only the
common intercept. This is preferred if cross-sections do not have significantly different
individual fixed effects after controlling for the effects of covariates. We conducted the Holtz-
Eakin (1988) test to examine if there exist significantly different individual fixed effects. The test
results revealed no supporting evidence for significant individual fixed effects. Based on these
test results, we include only the common intercept in each equation.
Fifth, based on unit-root test results, we first-difference the pre-launch period model but
not the post-launch period model.
Sixth, we need to determine the lag lengths of the covariates. More lagged variables are
preferred to mitigate the bias from potential model misspecification (Enders 2004). However,
13
including many lags reduces the degrees of freedom of the model. Also, if the time-series
variables are highly autocorrelated, including both contemporaneous and lagged values as
covariates can cause a collinearity problem. In Table 4, we check the serial correlation
coefficients of variables in the pre- (equations (1) and (2)) and post-launch (equations (3) – (5))
models.
== Table 4 about here ==
The serial correlation coefficients suggest that including lagged explanatory variables in the post-
launch period model can cause a serious collinearity problem. To alleviate this potential effect,
we include only the contemporaneous variables in the post-launch period model. 3 The serial
correlations of the variables of the (first-differenced) pre-launch period models are weak and we
have thirty data points for each movie. As such, in the pre-launch period model, we include four
lags of the explanatory variables to lessen the misspecification problem that may result from
including an insufficient number of lags.
Seventh, we improve the identification of parameters by adding exogenous variables in
the blog and search equations. In the search equation, we include the weekly Google search
index of the keyword “opening movie”; in the blog equation, we include the weekly reach of the
five most popular blog sites.
Finally, we do not include movie characteristics on the RHSs of the equations. They are
reserved for instruments (Elberse and Eliashberg 2003).
With the above considerations, the pre- and post-launch period models are developed as follows.
3 Moreover, we have only ten date points for each movie in the post-launch period. This indicates that the post-
launch period model may suffer a small degrees-of-freedom problem if we include lagged covariates.
14
4.1. The Pre-Launch Period Model
The pre-launch period model is specified by equations (1) and (2).
(1)
4 4A S
it k it k k it k
k 0 k 0
B c B B
it it t it
ln(Blog ) ln(Ad ) ln(Search )
ln(Traffic to blog sites ) c u
(2)
4 4A B
it k it k k it k
k 0 k 0
S c S S
it it t it
ln(Search ) ln(Ad ) ln(Blog )
ln(Volume of keyword 'opening movie' ) c u ,
where Δ represents first-differencing. Blogit is the blog volume, Adit is the advertising spending,
and Searchit is the search volume of movie i in pre-launch week t. The column vector cit is the set
of control variables that might influence the blog volume and search volume of movie i in pre-
launch week t. In our analysis, cit consists of the holiday dummy variable. B
t and S
t are the
time-fixed effects of pre-launch week t.
4.2. The Post-Launch Period Model
The post-launch period model is specified by equations (3) – (5).
(3)
A S R
it it it it
B c B B
it it t it
ln(Blog ) ln(Ad ) ln(Search ) ln(Revenue )
ln(Traffic to blog sites ) c u
(4)
A B R
it it it it
S c S S
it it t it
ln(Search ) ln(Ad ) ln(Blog ) ln(Revenue )
ln(Volume of keyword 'opening movie' ) c u
(5)
A B S
it it it it
D c R R
it it t it
ln(Revenue ) ln(Ad ) ln(Blog ) ln(Search )
ln(Scrns ) c u
15
Blogit is the blog volume, Adit is the advertising spending, Searchit is the search volume, and
Revenueit is the revenue of movie i in post-launch week t. The vector cit is the variables other
than our focal variables that might influence the weekly blog volume, search volume, and
revenue. cit consists of the holiday dummy variable. B
t , S
t , and R
t are the time-fixed effects of
post-launch week t. In the blog and search equations, we include weekly revenue to examine the
effect of movie consumption on blogging (blog equation) and searching (search equation)
activity. As the number of screens is an important determinant of movie revenue, we control for
the screen effect in the revenue equation. For identification purposes, we assume that the error
term of the revenue equation is uncorrelated with those of the blog and search equations:
B R S R
it it it itCov(u ,u ) Cov(u ,u ) 0 for any i and t.
4.3. The Opening-Week Model
We develop the opening-week model to examine the effect of the pre-launch blog and search
volume on the opening-week revenue of movies. It is a cross-section data model as only one data
point is observed per movie in the opening week. The model is specified as follows.
(6)
i 0 1 i 2 i
3 i 4 i
R
5 i 6 i i
ln(Open_Revenue ) ln(Open_Ad ) ln(Open_Scrns )
ln(PreLaunch_Blog ) ln(PreLaunch_Search )
(Holiday ) (Critic_Review ) u ,
where Open_Revenuei is the opening-week revenue, Open_Adi is the opening-week advertising
spending, Open_Scrnsi is the number of opening-week screens, PreLaunch_Blogi is the blog
volume cumulated during the pre-launch period of movie i, PreLaunch_Searchi is the search
volume cumulated during the pre-launch period of movie i, Holidayi is the indicator whether
movie i’s opening week contains any national holidays, and Critic_Reviewi is the average critic
16
rating of movie i as collected from Metacritic.com. Other movie characteristics are reserved for
instruments.
4.4. Identification and Estimation
Identifying the model parameters is a challenging task. We use exclusion restrictions and a
covariance restriction to identify the parameters. Then, we use a generalized method of moments
procedure to estimate the parameters. Details on the identification and estimation can be found in
Appendix A.
5. Empirical Results
5.1. Post-Launch Analysis Results
The post-launch period model is estimated by ordinary least squares (OLS) and generalized
method of moments (GMM) and the results are reported in Table 5. We estimate two nested
versions of the post-launch period model, the results of which are reported in Table 6 and 7. The
first nested version in Table 7 assumes that weekly search volume data are not collected, while
the second nested version in Table 8 assumes that they are collected. We juxtapose the
statistically significant relationships between the endogenous variables of the three versions in
Figure 3.
== Table 5 about here ==
== Table 6 about here ==
== Table 7 about here ==
== Figure 3 about here ==
17
First, we uncover how advertising and blogging influence revenue differently. Advertising
influences revenue directly as well as indirectly through search. Moreover, the direct effect of
advertising accounts for the majority of total advertising effectiveness: Doubling the weekly
advertising increases the same-week revenue by 22.5 percent directly and 3.3 percent (= 0.104 ×
0.234) indirectly through consumer search. In contrast, the effect of blogs on revenue is realized
only indirectly through consumer search activity. The elasticity of revenue to blog volume is 8.1
percent (= 0.348 × 0.234), which is less than one third of the advertising elasticity of revenue. In
sum, most of advertising effect on revenue is realized regardless of consumer online search,
while none of blog effect on revenue is materialized without consumer online search.
We can explain the difference between advertising effect and blog effect in terms of how the
two are delivered to consumers. Advertisements are available through various media including
TV, print, radio and the Internet, while blogs are available only on the Internet. As such, blogs
are reached to consumers only when the consumers search for the blogs, whereas advertisements
are reached to consumers regardless of consumers’ online search activity. This is consistent with
Onishi and Manchanda (2012) who find that blogs viewed, not blogs generated, matter for
market success of new products. Also, it implies that the effect of WOM on sales is moderated
by online search intensity.
Second, weekly online search volume predicts better weekly movie revenue than weekly
blog volume (compare the revenue equations in Table 6 and 7). Furthermore, weekly blog
volume does not explain weekly revenue once weekly advertising and search volume are
controlled for (Table 5). Thus, if the purpose of an analysis is to build a model to predict weekly
revenue, weekly search volume, not blog volume, is the variable that managers should collect.
18
Third, the weekly search volume of a movie is significantly associated with the weekly
Google search index of the keyword “opening movie”. In the same week, a one hundred percent
increase in the latter would lead to a 44.5 percent increase in the former. As such, the same ad
spending of a movie could generate substantially different search volumes, influenced by the
search index of the keyword “opening movie”. Knowing when the weekly search index of
“opening movie” is high during a year may help studio managers allocate advertising budgets.
The search index of the keyword “opening movie” represents how much consumers are
interested in opening movies in general, while the search volume of a movie measures how much
they are interested in the specific movie. Our finding shows that consumers’ generic interest in
opening movies triggers their search for individual movies.
Similarly, we find that the weekly blog volume of an individual movie is significantly
associated with the weekly blogging intensity of the blogger population, as measured by the
weekly reach of the five most popular blog sites. A one hundred percent increase in the weekly
reach of the five sites would lead to a 230 percent increase in the blogs of an individual movie in
the same week.
Fourth, a movie’s weekly revenue influences the search and blog volume of the movie in the
same week, implying that movie consumption is both a consequence and an antecedent of
blogging and online search. This has an implication for using the vector autoregressive (VAR)
framework (e.g., Pauwels et al. 2002) to find the long-term effects of the three variables: As blog
volume, search volume and movie revenue have contemporaneous effects on each other,
imposing a causal orders between the variables at the weekly level will bias the long-term effects
of the variables.
19
5.2. Pre-Launch Analysis Results
Table 8 shows the pre-launch analysis results, comparing OLS with GMM estimates.
== Table 8 about here ==
First, the OLS indicates significant contemporaneous association between blogging and
searching activity around a movie, while the GMM finds no significant causal effects between
the two activities in a same week; rather, blogging and searching in a pre-launch week are
attributed to advertising and respective exogenous variables.
The difference between the two results indicates there may exist omitted underlying factors
that influence online search and blogging simultaneously. The OLS finds the effects of
underlying factors by reporting the significant association between search and blog volume. For
example, the level of excitement among consumers may increase as a new movie’s release is
approaching. If the level of excitement about to-be released movies influences blogging and
searching activities simultaneously but is not controlled for, we may observe a significant
association between the blogging and searching activities as in the OLS analysis.
Second, we find that consumers respond to a movie’s pre-launch advertising by posting blogs
about and searching for the movie. However, the responsiveness is substantially different
between the two activities. The GMM finds that the elasticity of weekly blog volume to weekly
advertising is 0.028, while that of weekly search volume to advertising is 0.104 (0.053 in the
same week and 0.051 in the following week).
The differences may be explained by the behavioral cost that blogging and searching
activities incur. Online search activity costs a consumer just a few keystrokes while blogging
activity requires much more labor. Furthermore, consumers have little to write about a movie in
20
its pre-launch period because the trailers are the only information source. This suggests that in
the pre-launch period, blogging may be less responsive to advertising than searching is, implying
that movie studio managers should consider not only WOM but also online search volume to
measure advertising effectiveness in the pre-launch period of a movie.
Third, the suggested exogenous variables—the weekly reach of the five blog sites and the
weekly Google search index of the keyword “opening movie”—are associated with their
respective endogenous variables. As such, an amount of weekly advertising generates more blog
and search volumes when consumers’ overall blogging intensity and their generic interest in
movies are higher than average.
5.3. Opening-Week Analysis Results
Table 9 shows the OLS and GMM results for the opening-week model, with two different sets of
covariates. The first set includes only pre-launch blog volume whereas the second set includes
pre-launch search as well as blog volume.
== Table 9 about here ==
Pre-launch blog volume explains variations in opening-week movie revenue when pre-launch
search volume is omitted. When pre-launch search volume is added, however, pre-launch blog
volume loses its predictive power to pre-launch search volume. Furthermore, the model fit
improves when pre-launch search volume is included.
We attribute the superior predictive ability of pre-launch search volume results to the fact
that (i) searching is a more prevalent behavior of consumers than blogging and (ii) blogs
influence revenue insofar as they are exposed to consumers—as we find in the post-launch
period analysis. This finding complements previous studies that examine the predictive
21
performance of only search (e.g., Kulkarni et al. 2012) or online WOM (e.g., Gopinath et al.
2013, Liu 2006). To the best of our knowledge, this is the first study that compares the predictive
ability of pre-launch search volume and blog volume for new entertainment products. For
managers, this implies that the pre-launch search activity, not pre-launch blogging activity, is the
metric they need to monitor to better predict opening-week movie revenue.
6. Implications
This study implies that monitoring consumer search activity is important to i) better allocate
marketing budgets and ii) better predict the demand for a new product. Firms invest not only in
traditional advertising but also in social media advertising through promotional chat (Dellarocas
2006, Mayzlin 2006) and firm-created WOM (Godes and Mayzlin 2009). As such, it is important
to know how traditional and social media advertising influence revenue in order to maximize
communication effectiveness. Relevant to this objective is our finding that blogs require search
activity while traditional advertising does not for them to influence revenue. Firms can improve
their communication effectiveness by allocating their budget according to consumers’ search
intensity. For example, social media advertising can generate sales more effectively in
geographic markets where online search is more active than in markets where it is less active.
Demand forecasting is an important issue to any firm, and especially so to new-product
managers. Our findings indicate that managers can better forecast demand by monitoring search
volume of relevant keywords, as blog volume loses its predictive power to search volume when
both are included to predict movie revenue. For this purpose, the Google search index can serve
as a readily available data source.
22
Other implications are as follows. New-product managers want to measure pre-launch
advertising effectiveness to find a better pre-launch advertising schedule. To this end, previous
studies used virtual stock prices traded in the Hollywood Stock Exchange (e.g., Bruce et al.
2012). However, virtual stock prices have several limitations. First, the market where the virtual
stocks are traded is not a real but a simulated market. Furthermore, not all moviegoers are
involved in the virtual stock trading. As such, HSX stock prices do not measure the awareness
about movies in the entire consumer population. Second, virtual stock markets are not available
for all new products. For example, only movie and TV show stocks are traded on the HSX. Thus,
managers in other industries cannot conduct a similar analysis. This study suggests another and
potentially superior metric to measure pre-launch advertising effectiveness.
For researchers, this study exhibits that the online keyword search index and website traffic
can provide a source of exogenous variation in certain online actions of consumers. Figure 4
summarizes the key influencers of movie revenue and the resulting managerial implications.
== Figure 4 about here ==
7. Conclusions
Despite the prevalence of consumers’ media consumption activity, researchers have not paid
sufficient attention to it. Using blog and search volume of movies, this study has examined the
relationship between advertising, consumers’ media generation, media consumption and market
outcomes.
Several important findings emerge. First, there is an important difference as to how
advertising and blogging activity influence movie revenue. We find that blog postings require
consumer search for them to influence movie revenue. Advertising, on the other hand, influences
23
revenue without consumers’ online search activity. In fact, in the post-launch period of a movie,
the indirect effects of advertising on revenue through consumer searching activity are so small
that they barely contribute to advertising’s overall revenue impact.
Second, advertising is the main driver of movie revenue throughout the movie life cycle. The
opening-week revenue of a movie is influenced by its opening-week advertising and pre-launch
search volume. However, the pre-launch search volume of a movie is influenced mainly by its
advertising. As such, advertising is the dominant cause of opening-week revenue of movies. In
the post-launch period, both advertising and consumer blogging activity influence weekly movie
revenue, but the effectiveness of advertising is almost twice that of blog volumes.
Third, once online search volume of a movie is controlled for, blog volume of the movie does
not improve the performance of movie revenue predictions. Thus, for the purpose of predicting
the market success of a movie, managers should focus on search volume, not blog volume.
This study has several managerial implications. First, online search activity provide guidance
in allocating a firm’s marketing budget between traditional advertising and firm-created WOM
management (or social media management). Second, to predict opening-week revenue of a
movie, pre-launch search volume, not pre-launch WOM volume, is the metric that managers
should monitor. Also to predict the post-launch weekly revenue of a movie, weekly online search
volume of the movie should be monitored, not its weekly blog volume. Third, almost 80% of
movie advertising is executed in pre-launch period or during opening-week. Therefore, finding
an efficient pre-launch advertising schedule is an important task. Our findings that pre-launch
advertising is the main driver of pre-launch search activity and that the pre-launch search activity
24
has substantial effect on opening-week revenue suggest that studio managers may use the time-
series of pre-launch search volume to measure the effectiveness of pre-launch movie advertising.
This study is subject to several limitations. First, we use blog volume to examine the effect of
consumers’ media generation. While blog postings are the only WOM of consumers before a
new-product’s launch, in the post-launch period consumers express their opinions through
various channels, including review sites. Second, we do not consider the valence of blog
postings. This is expected in the pre-launch period because the new product is not yet available,
and as such there should be no valence information. But in the post-launch period, the WOM
valence can influence movie-going decisions. Perhaps, one way of controlling for the WOM
valence is to collect user review data and include it in the model. Third, this study is conducted
in the movie industry. For the generalization of its findings, this study needs to be extended to
other sectors such as video games, music albums, and books.
Several research opportunities remain in this field. The content of blogs may influence
consumer search behavior. For example, search may be greater when there is strong
disagreement among consumers’ opinions. Finding what products receive more searches from
consumers and why can be an interesting research question. Second, given that pre-launch search
volume influences post-launch sales, determining the optimal allocation of a pre-launch
advertising budget to maximize search volume is important. Lastly, consumer search activity
may lead them to the related products’ websites. Examining the relationship between search
activity and traffic to product websites can be an interesting topic.
25
Appendix A: Identification and Estimation
This appendix details how to identify and estimate the model parameters.
A.1 Identifying the Parameters of the Pre- and Post-Launch Period Models
We rely on a covariance restriction and several exclusion restrictions to identify the model
parameters. The covariance restriction is to identify the effect of revenue on blog and search
volume in the post-launch period model. It uses the assumption that the error term of the revenue
equation is uncorrelated with those of the blog and search equations for any movie and week:
B R S R
it it it itCov(u ,u ) Cov(u ,u ) 0 . The exclusion restrictions are to identify the effect of blog and
search volume on revenue and each other. For each endogenous variable, we first find a variable
that creates exogenous variations in only the focal endogenous variable. Then, we include the
exogenous variable as a covariate in only the focal endogenous variable’s equation (i.e. the
equation where the focal endogenous variable is the dependent variable). Also, the exogenous
variable is used as an instrument in the equations where the focal endogenous variable is
included as a covariate. For example, in order to identify the effect of weekly search volume on
revenue (the parameters S
k in (5)), we find a variable that create exogenous variations in only
weekly search volume and include it in the search equation, but not in blog and revenue
equations. We also use the exogenous variable as an instrument for weekly search volume in the
blog and revenue equations. In the following subsections, we explain how to identify the model.
An Exogenous Variable for the Weekly Blog Volume of a Movie. We argue that the overall
weekly blogging activity of the entire blogger population—i.e., blogging intensity of the blogger
population across all topics and issues in each week—provides exogenous variation for the
weekly blog volume of an individual movie. The intuition is that the overall blogging intensity of
26
the blogger population in a week, which is influenced by various exogenous factors such as
weather and holidays (Gopinath et al. 2013), is likely to influence their blogging activity on any
random topics in that week. Thus, the overall blogging intensity of the blogger population in a
week will provide exogenous variations for the weekly blog volumes of an individual movie in
the week.
The suggested exogenous variable—the weekly blogging intensity of the blogger
population—will have little influence on the weekly revenue of a specific movie given that we
control for the effects of weekly advertising, blog volume, and search volume of the movie—i.e.,
if we include weekly advertising, blog volume and search volume of the movie in the revenue
equation. Also, the exogenous variable will have little influence on consumers’ online search
activity for a specific movie given that we control for the effects of weekly advertising, blog
volume and search volume of the movie—i.e., if we include weekly advertising, blog volume
and search volume of the movie in the search equation. Finally, the weekly advertising spending
of an individual movie is not likely to be correlated with the weekly blogging intensity of the
blogger population because an individual movie’s advertising spending will hardly contribute to
the overall blogging activity of the blogger population.
The above arguments suggest that the weekly blogging intensity of the blogger population
provides exogenous variations to the weekly blog volume of an individual movie, but not to
other endogenous variables of the movie. Assuming that the weekly reach of popular blog sites
represents the overall weekly blogging activity of the blogger population, we include the weekly
reach of the five most popular blog sites (blogger.com, tumblr.com, wordpress.com,
squarespace.com, and posterous.com) as a covariate of the blog equation. We use the exogenous
variable as an instrument in the search and revenue equations.
27
An Exogenous Variable for the Weekly Search Volume of a Movie. We argue that the weekly
Google search index of the keyword “opening movie” provides exogenous variation for the
weekly search volume of an individual movie. The weekly search index of the keyword “opening
movie”, which reflects consumers’ generic interest in movies, measures the seasonality in the
movie industry (see Figure 2). As the seasonality is mainly determined by exogenous factors, the
weekly search index of the keyword “opening movie” contains exogenous variations. Moreover,
the generic interest in movies will influence the search volume of an individual movie as
consumers’ generic interest in movies in a week may transfer to interest in individual movies in
the week. For example, consumers may first search with the keyword “opening movies” to find
what movies are available to them and then narrow down to a few specific movies to acquire
detailed information about those movies. Rutz and Bucklin (2011) find a similar phenomenon in
the lodging industry.
The suggested exogenous variable—the weekly Google search index of the keyword
“opening movie”—will have little influence on the weekly revenue of a specific movie given that
we control for the effects of weekly advertising, blog volume, and search volume of the movie.
Also, the exogenous variable will have little influence on consumers’ blogging activity for a
specific movie given that we control for the effects of weekly advertising, blog volume and
search volume of the movie. Finally, weekly advertising spending of an individual movie should
not contribute to the weekly search volume of the generic keyword “opening movie”, as the latter
represents consumers’ generic interest in opening movies without a specific movie under
consideration (e.g., Joo et al. 2012 for the financial services product industry).
The above arguments suggest that the weekly search index of the keyword “opening movie”
provides exogenous variations for the weekly search volume of individual movies, but not to
28
other endogenous variables. We include the exogenous variable as a covariate of the search
equation and use it as an instrument in the blog and revenue equations.
Exogenous Variation in the Weekly Advertising Spending of a Movie. Elberse and Anand
(2007) argue that the week-to-week change in movie advertising spending is exogenous. The
reason is that movie studios typically purchase the vast majority of advertising times in the
upfront market. This practice makes it extremely hard for a movie studio to buy additional
advertising time based on new information, once the advertising schedule is set. Additional
purchase of advertising time in the opportunistic market, if any, is affected by exogenous events
such as sports broadcasts and award shows. Thus, week-to-week variation in advertising
spending of a movie is mainly driven by exogenous factors, implying that weekly variation in
advertising spending of a movie can be used as an instrument for the movie’s weekly advertising
spending.
Based on the above argument, we use log(Ait-1) – log(Ait-2) as an instrument for log(Ait) in
the post-launch model and the opening-week model4. We also include movie characteristics such
as director power, star power and production budget as instruments because movie
characteristics such as the participation of high-profile stars and directors influence the total
advertising budget of a movie (Basuroy et al. 2006, Hennig-Thurau et al. 2006). Note that we do
not need an instrument for pre-launch period model as the first-differenced weekly advertising in
the model is exogenous.
Identifying the Effect of Weekly Revenue on Weekly Blog Volume and Search Volume.
Unexpected shocks in the supply side—e.g., an unexpected (from consumer perspective)
4 For the opening-week model where t = 0, we use log(Ait-1) – log(Ait-2) as an instrument for the opening-week
advertising spending, log(Ai0).
29
increase or decrease in the available theaters—can create an exogenous variation to the weekly
revenue of individual movies. However, such unexpected shock in the supply side is not readily
observable to researchers.5 To identify the effect of the weekly revenue of a movie on its weekly
blog volume and search volume, we rely on the assumption that the error term of the revenue
equation (5) is contemporaneously uncorrelated with the error terms of the blog and search
equations (3) and (4): i.e., R B R S
it it it itCov(u ,u ) Cov(u ,u ) 0 . This assumption is justified if
unobserved factors in a post-launch week do not influence the weekly search volume, blog
volume, and revenue simultaneously in that week. Suppose, for instance, that more consumers to
search, blog, and watch movies in a week that contains a national holiday than in a week that
does not any holidays. If this is the case and the model does not include the holiday dummy
variable, the assumption R B R S
it it it itCov(u ,u ) Cov(u ,u ) 0 may not hold. This type of unobserved
simultaneous effects can be alleviated by including the weekly dummy variables—which will
account for the unobserved effect of each week—and any observable variables that are known to
influence our endogenous variables, making the assumption R B R S
it it it itCov(u ,u ) Cov(u ,u ) 0
reasonable.
If the model specification makes the assumption R B R S
it it it itCov(u ,u ) Cov(u ,u ) 0 reasonable, we
can identify the effect of weekly revenue on weekly blog and search volume by estimating (5)
first and using its residuals, R
itu as an instrument for ln(Revenueit) in (3) and (4). The intuition is
that if the parameters in (5) are known, R
itu is effectively known. The assumption
R B R S
it it it itCov(u ,u ) Cov(u ,u ) 0 states that R
itu is uncorrelated with B
itu and S
itu , whereas it is
5 Gopinath, Chintagunta and Venkataraman (2013) suggest, for the exogenous variation in the number of theaters in
a designated market area (DMA), the number of temporarily closed theaters in the DMA. This information was not
available to us.
30
partially correlated with ln(Revenueit). Thus, we effectively have R
itu as an instrument for
ln(Revenueit) in the blog and search equations. The estimation procedure is as follows. First, we
estimate (5) by an instrumental variable technique and save the residuals, R
itu . Then we estimate
the blog and search equations using R
itu as an instrument for ln(Revenueit). The fact that R
itu
depends on estimates from a prior stage does not affect consistency of the estimators of the blog
and search equations (Wooldridge 2002, p. 207).
A.2 Identifying the Parameters of the Opening-Week Model
The opening-week model (6) is a cross-section data model. The opening-week advertising,
opening-week screen, pre-launch search and blog volume are potentially correlated with the error
term. We use movie characteristics as common instruments for all the endogenous variables. As
an additional instrument for pre-launch blog volume of a movie, we use the total pre-launch blog
traffic of the movie, which is the sum of weekly traffic to the five blog sites during the movie’s
pre-launch weeks. Also, as an additional instrument for pre-launch search volume of a movie, we
use the sum of weekly Google search index of “opening-movie” during the movie’s pre-launch
weeks. Lastly, we use log(Ait-1) – log(Ait-2) as an instrument for the opening-week advertising
spending, log(Ai0).
A.3 Estimation of the Pre- and Post-Launch Period Models
We apply a generalized method of moments (GMM) procedure to each equation of each model.
Let i be the index for a movie (i = 1, …, N) where N = 153 and t be the index for time (t = 1, …,
Ti). In the pre-launch period model, Ti = 30 for each movie; in the post-launch period model, Ti ≤
10. Let yit be the dependent variable of the estimation equation, xit be the corresponding row
vector of explanatory variables, and zit be the corresponding row vector of instruments. For
31
movie i, let yi be the Ti×1 vector of the dependent variable of the focal equation, obtained by
stacking yit from t = 1, …, Ti. Xi and Zi are similarity constructed by stacking xit and zit.
For each equation, the GMM estimation steps are as follows. (i) For the focal equation, apply
the two-stage least square (2SLS) estimation and obtain residuals. (ii) Use these residuals to
obtain the GMM weighting matrix that is robust to arbitrary serial correlation of the error term.
(iii) With the weighting matrix, estimate the parameters of the focal equation by GMM. The
GMM weighting matrix in step (ii) is in (A-1).
(A-1) 1
N1
i i i ii 1
ˆ ˆˆ ˆW N Z u u Z
,
where i
ˆu is the Ti×1 vector of residuals obtained from the 2SLS regression in (i). The GMM
estimator and its asymptotic robust covariance matrix are
(A-2)
1
GMM
1 1N N N
GMM i i i i i i i ii 1 i 1 i 1
ˆ X ZWZ X X ZWZ Y,
ˆ ˆ ˆ ˆ ˆ ˆ ˆˆ ˆV( ) X X X u u X X X ,
where X, Z, and Y are obtained by stacking Xi, Zi, and yi from i = 1, …, N,
1
i i i i i iX Z (Z Z ) Z X , and i i i GMM
ˆu y X . In the post-launch analysis, we first estimate the
revenue equation (5) and obtain R
it it it GMMˆu y x , the GMM residual of the revenue equation.
Then we use R
itu as an instrument for log(Revenueit) in estimating the blog and search equations.
The following variables are used as instruments.
Blog equation: previous week’s advertising variation, weekly search index of the
keyword “opening movie,” weekly traffic to the five blog sites, the holiday dummy variable,
32
movie characteristics, the week dummy variables, and residuals from the revenue equation (for
the post-launch period model).
Search equation: previous week’s advertising variation, weekly search index of the
keyword “opening movie,” weekly traffic to the five blog sites, the holiday dummy variable,
movie characteristics, the week dummy variables, and residuals from the revenue equation (for
the post-launch period model).
Revenue equation: previous week’s advertising variation, weekly search index of the
keyword “opening movie,” weekly traffic to the five blog sites, the holiday dummy variable,
movie characteristics, and the week dummy variables.
A.4 Estimation of the Opening-Week Model
The opening-week model is a cross-section data model. The GMM estimator for this model is
similar to (A-1) and (A-2) except that the cross-section heteroskedasticity, instead of the
arbitrary serial correlation, is considered to construct the weighting matrix W.
33
Web Appendix: Constructing Cross-Sectionally Comparable Search Volume Measures
from the Google Search Index
Google Trends provides weekly search indices of keyword queries entered into the Google
search engine. Because the index is normalized to conceal the actual search volume of the
keyword, researchers cannot compare the search volumes across different keywords if the raw
search index is used as provided by Google Trends. In this section, we introduce a methodology
to transform the weekly search indices from Google into cross-sectionally comparable search
volume metrics. The cross-sectionally comparable search volume metrics introduced in the data
section is acquired by applying the following methodology to the weekly Google search index of
the focal movies.
The method consists of three steps. The first is the keyword selection step, where basis
keywords and movie keywords are selected. Any set of words can be selected for the basis
keywords. The only requirement is that the search volume is neither too high nor too low when
compared with the search volume of the focal movies. For our analysis, we select the following
seven basis keywords: “mac os,” lamp, hello, windows, weather, tomatoes, video, and imdb.
They are listed in the order of search amount in the U.S. movie industry. That is, among the eight
keywords, “mac os” is the least searched keyword and “imdb” is the most searched keyword in
the U.S. movie industry. Then, for each movie, we select a set of keywords that are considered to
be used most by consumers to search the movie. For example, for the movie 12 Rounds, we
choose “12 Rounds” as the keyword for the movie. For the movie Paul Blart: Mall Cop, we
choose “blart + mall cop,” which means either blart or “mall cop.”6
6 The selection of movie keywords is guided by the “Related terms” section of Google Trends. The chosen keywords
for each movie can be acquired upon request.
34
The second step is the keyword matching step. To each movie, we assign an appropriate
basis keyword and collect the Google search index of the movie keyword along with that of the
assigned basis keyword to the movie. Any basis keyword can be assigned to any movie as long
as the search index of the movie keyword is comparable to that of the chosen basis keyword for
the movie. That is, if the search volume of a certain basis keyword is too large compared to the
search volume of a movie keyword, that basis keyword should not be used for that movie
because the movie’s search index so collected will be shrunk to zero for many or all of the
weeks. Google Trends provides diverse filters to minimize the measurement error in collecting
intended search indices. We limit our search so that the search volume is measured only from the
U.S. movie industry.
The last step is the transformation step. We transform each movie’s search index into our
cross-sectionally comparable search volume measure. The mathematics behind this step can be
explained as follows. Let kj be the basis keyword at the j’th position (i.e., k1 = “mac os”, k2 =
lamp, …, k8 = imdb), and let jk
tI represent the search index of the j’th basis keyword at week t.
We calculate the ratio of the Google search index of two adjacent basis keywords,
j j 1k kj, j 1
t t tr I I , for each t and for all seven pairs of adjacent basis keywords. Let m
tI be the search
index of movie m at week t. Suppose that, in the second step, the basis keyword of position j was
assigned to movie m. Then, for movie m at week t, our cross-sectionally comparable search
volume measure, denoted by m
tS , is calculated as in (WA-1).
(WA-1) m m j, j 1 2,1 1,0
t t t t tS I (r r r )
, where 1,0
tr is the weekly search index of the basis keyword “mac os” collected together with the
keyword “lamp”. For example, if movie m is compared with the basis keyword of the eighth
35
position (i.e., “imdb”), then m m 8,7 2,1 1,0
t t t t tS I (r r r ) for that movie. If movie m is compared
with the basis keyword of the first position (i.e., “mac os”), then m m 1,0
t t rS I r for movie m.
Figure WA.1(a) shows the weekly multiplier associated with each basis keyword, i.e.,
j, j 1 2,1 1,0
t t t(r r r ) if the keyword is at the j’th position. For movies Zombieland and X-Men
Origins: Wolverine, Figure WA.1(b) exemplifies the raw search indices of Google Trends and
their transformed cross-sectionally comparable search volume measures from 60 weeks before
the movies’ releases to 10 weeks after their releases. Note that our transformed search volume
measures show a substantial difference in consumer search activities between the two movies.
== Figure WA.1 about here ==
36
References
Baltagi, B. H. 1995. Econometric Analysis of Panel Data. John Wiley & Sons. New York, NY.
Basuroy, S., K. K. Desai, D. Talukdar. 2006. An empirical investigation of signaling in the
motion picture industry. Journal of Marketing Research 43 (May) 287-295.
Bruce, N. I., N. Z. Foutz, C. Kolsarici. 2012. Dynamic effectiveness of advertising and word of
mouth in sequential distribution of new products. Journal of Marketing Research 49
(August) 469-486.
Chevalier, J., D. Mayzlin. 2006. The effect of word of mouth on sales: online book reviews.
Journal of Marketing Research 43 (3) 345-354.
Chintagunta, P. K., S. Gopinath, S. Venkataraman. 2010. The effects of online user reviews on
movie box office performance: accounting for sequential rollout and aggregation across local
markets. Marketing Science 29 (5) 944-957.
Dhar, V., E. A. Chang. 2009. Does chatter matter? The impact of user-generated content on
music sales. Journal of Interactive Marketing 23 (4) 300-307.
Dellarocas, C. 2006. Strategic manipulation of internet opinion forums: implications for
consumers and firms. Management Science 52 (10) 1577-1593.
Duan, W., B. Gu, A. B. Whinston. 2008a. The dynamics of online word-of-mouth and product
sales – an empirical investigation of the movie industry. Journal of Retailing 84 (2) 233-242.
______, ______, ______. 2008b. Do online reviews matter? – an empirical investigation of panel
data. Decision Support Systems 45 (4) 1007-1016.
Elberse, A., B. Anand. 2007. The effectiveness of pre-release advertising for motion pictures: an
empirical investigation using a simulated market. Information Economics and Policy 19 319-
343.
______, J. Eliashberg. 2003. Demand and supply dynamics for sequentially released products in
international markets: the case of motion pictures. Marketing Science 22 (3) 329-354.
Enders, W. 2004. Applied Econometric Time Series. John Wiley & Sons. Hoboken, NJ.
Godes, D., D. Mayzlin. 2009. Firm-created word-of-mouth communication: evidence from a
field test. Marketing Science 28 (4) 721-739.
Gopinath, S., P. K. Chintagunta, S. Venkataraman. 2013. Blogs and local-market movie box-
office performance. Forthcoming in Management Science.
37
Hanssens, D. M. 2009. Empirical Generalizations about Marketing Impact. Marketing Science
Institute. Cambridge, MA.
Hennig-Thurau, T., M. B. Houston, S. Sridhar. 2006. Can good marketing carry a bad product?
Evidence from the motion picture industry. Marketing Letters 17(3) 205-219.
Holtz-Eakin, D. 1988. Testing for individual effects in autoregressive models. Journal of
Econometrics 39 297-307.
Joo, M., K. C. Wilbur, Y. Zhu. 2012. Television advertising and online search. SSRN Working
Paper.
Kulkarni, G., P. K. Kannan, W. Moe. 2012. Using online search data to forecast new product
sales. Decision Support Systems 52 (3) 604-611.
Liu, Y. 2006. Word of mouth for movies: its dynamics and impact on box office revenue.
Journal of Marketing 70 (July) 74-89.
Mayzlin, D. 2006. Promotional chat on the internet. Marketing Science 25 (2) 155-163.
Onishi, H., P. Manchanda. 2012. Marketing activity, blogging, and sales. International Journal
of Research in Marketing 29 (3) 221-234.
Pauwels, K., D. M. Hanssens, S. Siddarth. 2002. The long-term effects of price promotions on
category incidence, brand choice, and purchase quantity. Journal of Marketing Research 39
(November) 421-439.
Rutz, O. J., R. E. Bucklin. 2011. From generic to branded: a model of spillover in paid search
advertising. Journal of Marketing Research 48 (February) 87-102.
The Pew Research Center. 2012. Pew Internet & American Life Project Tracking Surveys.
available at http://pewinternet.org/Trend-Data-(Adults)/Online-Activites-Total.aspx.
Trusov, M,, R. E. Bucklin, K. Pauwels. 2009. Effects of word-of-mouth versus traditional
marketing: findings from an internet social networking site. Journal of Marketing 73
(September) 90-102.
Villanueva, J, S. Yoo, D. M. Hanssens. 2008. The impact of marketing-induced versus word-of-
mouth customer acquisition on customer equity growth. Journal of Marketing Research 45
(February) 48-59.
Wooldridge, J. M. 2002. Econometric Analysis of Cross Section and Panel Data. The MIT Press.
Cambridge, MA.
38
Yang, S, M. Hu, R. S. Winer, H. Assael, X. Chen. 2013. An empirical study of word-of-mouth
generation and consumption. Forthcoming in Marketing Science.
39
Table 1: Comparison of Relevant Studies
Study
Variables Examined
Traditional
Media
Consumer-
Generated
Media
Search or
media
consumption
Market
Outcome
Chevalier and Mayzlin (2006)
Dhar and Chang (2009)
Duan et al. (2008ab)
Liu (2006)
√ √
Chintagunta et al. (2010)
Gopinath et al. (2011)
Onishi and Manchanda (2012)
Trusov et al. (2009)
Villanueva et al. (2008)
√ √ √
Joo et al. (2012) √ √ Kulkarni et al. (2012) √ √ √ Yang et. al. (2013) √ √ √ This study √ √ √ √
40
Table 2: Variables and Data Sources
Category Variable Source of Data
Marketing activities Weekly advertising spending Nielsen
Weekly number of screens The numbers
Focal endogenous
variables
Weekly blog postings Google blog search engine
Weekly search volume Google Trends
Weekly revenue The Numbers
Movie Characteristics Genre, MPAA rating, Sequel IMDb
Average critic rating [range: 1 – 100] Metacritic
Director power variables IMDb
Monthly Seasonality: January-April; May-August;
September-October, November-December
Einav (2007)
Ho et al. (2009)
Holiday National holiday
Others Weekly Google search index of the keyword “opening
movie” Google
Daily reach of the five popular blog sites Alexa.com
41
Table 3: Descriptive Statistics
(a) Pre-Launch Period (N = 153, t = -30 ~ -1)
Mean Median Std. Dev. Min Max
Weekly advertising spending ($ 000) 389.5 0 1,250.0 0 10,009
Weekly blog postings 10.19739 2 40.04684 0 1279
Weekly search volume 13,087.8 2,188.0 46,616.2 0 1,366,626
(b) Opening Week (N = 153, t = 0)
Mean Median Std. Dev. Min Max
Advertising spending ($ 000) 5,080.2 5,651.1 2,960.4 0.022 12,562
Blog postings 177.2 45 483.5 2 4,733
Search volume 107,624 49,774.6 194,005.4 651 1,531,028
Screens 7,254.0 8,262 4,915.4 9 21,625
Revenue ($ 000) 22,960.5 14,118.4 29,393.3 46.2 200,077.3
(c) Post-Launch Period (N = 153, t = 1 up to 10)
Mean Median Std. Dev. Min Max
Weekly advertising spending ($ 000) 429.2 3.4 990.3 0 8,318.2
Weekly blog postings 66.9 10 621.9 0 18,543
Weekly search volume 41,710.8 13,500 106,197.4 370 2,009,781
Weekly screens 8,993.7 5,152 8,546.5 14 30,479
Weekly revenue ($ 000) 5,189.5 928.8 11,273.8 0 139,403.7
(d) Movie Characteristics (N = 153) and Other Instruments
Mean Median Std. Dev. Min Max
Critic Rating [0 ~ 100] 57.5 58.4 15.5 14.3 92.7
Production budget ($ 000) 59,041.1 40,000 54,694.9 11 250,000.0
Past B-O revenue of the focal director $873 M $ 451 M $ 1,090 M $ 0 M $ 6,520 M
Average director rating from the past 6.73 6.78 0.66 4.81 8.71
S. D. of director ratings from the past 1.98 1.96 0.27 1.46 3.45
Genre (%) Action: 21.6, Comedy: 28.1, Drama: 19.4
MPAA (%) G: 2.6, PG: 20.7, PG13: 43.0, R: 33.7
Sequel 15 movies (9.8%) are sequel.
Monthly Seasonality (%) Jan – Arp: 24.3, May – Aug: 31.1, Sep – Oct: 17.6
Weekly search index of keyword
“opening movie” [0 ~ 100]
43.6 40.0 14.2 19.0 100.0
Weekly traffic to five blog sites (000) 572.1 547.6 90.5 441.6 914.1
42
Table 4: Serial Correlation Coefficients of the Main Variables
(a) The Variables in the Pre-Launch Period Model
Δ(Advertising) Δ(Search) Δ(Blog)
Serial Correlation Coefficient -0.181 -0.234 -0.235
Δ represents first-differencing.
(b) The Variables in the Post-Launch Period Model
Advertising Blogs Search Revenue
Serial Correlation Coefficient 0.707 0.505 0.903 0.895
43
Table 5: Estimation Results: Post-Launch Period (N=153, T=10)
(a) Blog equation (DV: log of weekly blog volume)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.102 0.037 0.006 *** 0.086 0.041 0.037 **
Searching 0.344 0.076 0.000 *** 0.460 0.164 0.005 ***
Revenue 0.092 0.069 0.181 0.240 0.133 0.071 *
Holiday -0.108 0.093 0.242 -0.161 0.079 0.040 **
Traffic to the five blog sites 1.867 0.568 0.001 *** 2.301 0.556 0.000 ***
R2 0.278 N.A.
Adj. R2 0.269 N.A.
SSR 1.358 1.393 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept and fixed effects are not reported.
(b) Search equation (DV: log of weekly search volume)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.062 0.034 0.073 * 0.104 0.029 0.000 ***
Blogging 0.141 0.045 0.002 *** 0.348 0.092 0.000 ***
Revenue 0.426 0.062 0.000 *** 0.369 0.097 0.000 ***
Holiday 0.109 0.063 0.085 * 0.050 0.051 0.322
Search index of keyword “opening movie” 0.318 0.180 0.077 * 0.445 0.183 0.015 **
R2 0.510 N.A.
Adj. R2 0.504 N.A.
SSR 0.971 1.026 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept and fixed effects are not reported.
(c) Revenue equation (DV: log of weekly revenue)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.196 0.014 0.000 *** 0.225 0.021 0.000 ***
Blogging 0.052 0.025 0.041 ** 0.045 0.045 0.321
Searching 0.221 0.036 0.000 *** 0.234 0.061 0.000 ***
Holiday 0.107 0.038 0.004 *** 0.101 0.034 0.003 ***
Screens 0.769 0.033 0.000 *** 0.799 0.057 0.000 ***
R2 0.918 N.A.
Adj. R2 0.917 N.A.
SSR 0.543 0.551 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept and fixed effects are not reported.
44
Table 6: Estimation Results: The Post-Launch Period Model Omitting Search Volume
(a) Blog equation (DV: log of weekly blog volume)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.097 0.035 0.006 ** 0.233 0.029 0.000 ***
Searching N.A. N.A. N.A. N.A. N.A. N.A.
Revenue 0.281 0.063 0.000 *** 0.422 0.142 0.003 ***
Holiday -0.072 0.089 0.416 -0.152 0.089 0.090 *
Traffic to the five blog sites 1.637 0.563 0.004 *** 1.933 0.553 0.001 ***
R2 0.241 N.A.
Adj. R2 0.234 N.A.
SSR 1.388 1.466 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: Fixed effects and intercepts are not reported.
(b) Revenue equation (DV: log of weekly revenue)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.237 0.016 0.000 *** 0.273 0.021 0.000 ***
Blogging 0.093 0.026 0.001 *** 0.140 0.039 0.000 ***
Searching N.A. N.A. N.A. N.A. N.A. N.A.
Holiday 0.174 0.040 0.000 *** 0.157 0.038 0.000 ***
Screens 0.821 0.030 0.000 *** 0.925 0.065 0.000 ***
R2 0.902 N.A.
Adj. R2 0.901 N.A.
SSR 0.600 0.622 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept and fixed effects are not reported.
45
Table 7: Estimation Results: The Post-Launch Period Model Omitting Blog Volume
(a) Search equation (DV: log of weekly search volume)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.068 0.035 0.055 * 0.160 0.030 0.000 ***
Blogging N.A. N.A. N.A. N.A. N.A. N.A.
Revenue 0.501 0.058 0.000 *** 0.512 0.089 0.000 ***
Holiday 0.102 0.060 0.090 * 0.001 0.065 0.985
Search index of keyword “opening movie” 0.309 0.172 0.073 * 0.340 0.166 0.041 **
R2 0.509 N.A.
Adj. R2 0.504 N.A.
SSR 0.991 1.021 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept and fixed effects are not reported.
(b) Revenue equation (DV: log of weekly revenue)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.209 0.014 0.000 *** 0.230 0.022 0.000 ***
Blogging N.A. N.A. N.A. N.A. N.A. N.A.
Searching 0.214 0.034 0.000 *** 0.293 0.057 0.000 ***
Holiday 0.113 0.037 0.002 *** 0.071 0.037 0.054 *
Screens 0.757 0.031 0.000 *** 0.799 0.059 0.000 ***
R2 0.912 N.A.
Adj. R2 0.911 N.A.
SSR 0.547 0.563 *** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept and fixed effects are not reported.
46
Table 8: Estimation Results: Pre-Launch Period (N=153, T=30)
(a) Blog equation (DV: log of weekly blog volume)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Ad, same week 0.020 0.009 0.031 *** 0.027 0.011 0.010 ***
one week ago 0.012 0.007 0.110 0.015 0.009 0.114
two weeks ago -0.004 0.007 0.607 -0.011 0.010 0.289
three weeks ago 0.007 0.009 0.430 -0.001 0.011 0.957
four weeks ago 0.015 0.010 0.132 0.012 0.009 0.214
Searching, same week 0.101 0.015 0.000 *** -0.042 0.109 0.701
one week ago -0.018 0.013 0.159 0.015 0.113 0.896
two weeks ago -0.019 0.011 0.089 * 0.072 0.097 0.458
three weeks ago -0.029 0.009 0.001 *** -0.008 0.136 0.951
four weeks ago -0.029 0.010 0.003 *** -0.118 0.099 0.234
Holiday 0.001 0.024 0.975 -0.002 0.024 0.937
Traffic to the five blog sites 1.564 0.406 0.000 *** 1.369 0.392 0.001 ***
R2 0.043 N.A.
Adj. R2 0.035 N.A.
SSR 0.754 0.786
Corr. coef. between actual and fitted values
in level 0.750 0.730
*** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: Fixed effects are not reported.
47
(b) Search equation (DV: log of weekly search volume)
OLS GMM
Variable Coef. SE P-val. Coef. SE P-val.
Ad, same week 0.041 0.011 0.000 *** 0.053 0.014 0.000 ***
one week ago 0.036 0.010 0.001 *** 0.051 0.016 0.001 ***
two weeks ago -0.016 0.007 0.033 ** -0.007 0.014 0.627
three weeks ago -0.005 0.009 0.584 0.005 0.016 0.759
four weeks ago -0.006 0.007 0.389 -0.008 0.017 0.646
Blogging, same week 0.310 0.036 0.000 *** 0.020 0.320 0.950
one weeks ago 0.221 0.031 0.000 *** 0.235 0.258 0.363
two weeks ago 0.064 0.025 0.010 *** -0.157 0.404 0.698
three weeks ago 0.037 0.024 0.121 0.058 0.316 0.854
four weeks ago 0.001 0.023 0.969 0.137 0.242 0.573
Holiday 0.038 0.030 0.205 0.056 0.047 0.238
Search index of keyword “opening movie” 0.226 0.073 0.002 *** 0.237 0.085 0.005 ***
R2 0.064 N.A.
Adj. R2 0.057 N.A.
SSR 1.078 1.121
Corr. coef. between actual and fitted values
in level 0.923 0.680
*** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: Fixed effects are not reported.
48
Table 9: Estimation Results: Opening-Week (N=153)
(a) OLS Estimation
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.252 0.083 0.003 *** 0.163 0.075 0.032 **
Screens 0.698 0.037 0.000 *** 0.672 0.033 0.000 ***
Pre-launch blog volume 0.120 0.037 0.001 *** 0.021 0.035 0.543
Pre-launch search volume N.A. N.A. N.A. 0.218 0.037 0.000 ***
Holiday 0.326 0.148 0.029 ** 0.277 0.122 0.024 **
Critical Review 0.541 0.210 0.011 ** 0.507 0.181 0.006 ***
R2 0.904 0.928
Adj. R2 0.900 0.925
SSR 0.655 0.567
*** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept is not reported.
(b) GMM Estimation
Variable Coef. SE P-val. Coef. SE P-val.
Advertising 0.303 0.146 0.041 ** 0.272 0.144 0.062 *
Screens 0.727 0.073 0.000 *** 0.655 0.063 0.000 ***
Pre-launch blog volume 0.211 0.062 0.001 *** -0.026 0.075 0.728
Pre-launch search volume N.A. N.A. N.A. 0.354 0.059 0.000 ***
Holiday 0.369 0.148 0.014 ** 0.167 0.120 0.165
Critical Review 0.501 0.267 0.063 * 0.513 0.209 0.016 **
R2 N.A. N.A.
Adj. R2 N.A. N.A.
SSR 0.686 0.599
*** P-val < 0.01, ** P-val < 0.05, * P-val <0.1. Note: The intercept is not reported.
49
Figure 1: Weekly Trends of Advertising, Blog Volume, Search Volume
and Box Office Revenue
(a) Average Weekly Advertising Spending and Box Office Revenue (N=153)
(b) Average Weekly Blog Volume and Box office Revenue (N=153)
(c) Average Weekly Search Volume and Box office Revenue (N=153)
0
5000
10000
15000
20000
25000
0
1000
2000
3000
4000
5000
6000
-30
-28
-26
-24
-22
-20
-18
-16
-14
-12
-10 -8 -6 -4 -2 0 2 4 6 8
10
B-O
rev
enu
e ($
00
0)
Ad
ver
tisi
ng (
$ 0
00
)
week
Revenue Advertising
0
5000
10000
15000
20000
25000
0
50
100
150
200
250
-30
-28
-26
-24
-22
-20
-18
-16
-14
-12
-10 -8 -6 -4 -2 0 2 4 6 8
10
B-O
rev
enu
e ($
00
0)
Blo
gs
week
Revenue Blogs
0
5000
10000
15000
20000
25000
0
20
40
60
80
100
120
-30
-28
-26
-24
-22
-20
-18
-16
-14
-12
-10 -8 -6 -4 -2 0 2 4 6 8
10
B-O
rev
enu
e ($
00
0)
Sea
rch
volu
me
(00
0)
week
Revenue Search
50
Figure 2: Weekly Search Index of the Keyword “Opening Movie” and
Weekly Traffic to the Five Blog Sites
0
100
200
300
400
500
600
700
800
900
1000
0
20
40
60
80
100
120
1/6
/200
8
2/6
/200
8
3/6
/200
8
4/6
/200
8
5/6
/200
8
6/6
/200
8
7/6
/200
8
8/6
/200
8
9/6
/200
8
10
/6/2
008
11
/6/2
008
12
/6/2
008
1/6
/200
9
2/6
/200
9
3/6
/200
9
4/6
/200
9
5/6
/200
9
6/6
/200
9
7/6
/200
9
8/6
/200
9
9/6
/200
9
10
/6/2
009
11
/6/2
009
12
/6/2
009
Rea
ch o
f th
e b
log s
ites
(0
00
)
Sea
rch
in
dex
of
"op
enin
g m
ovie
"
week
Search index of "opening movie" Reach of the blog sites
51
Figure 3: Comparison of Statistically Significant Relationship
in the Three Versions of the Post-Launch Period Model
Adt
Blogt Searcht
Revenuet
The Full Model
(a) The Full Model
Adt
Blogt
Revenuet
Model without Searching
(b) The Model Omitting Search Activity
Adt
Searcht
Revenuet
The Full Model
(c) The Model Omitting Blogging Activity
52
Figure 4: Key Influencers on Revenue
Opening-week revenue Weekly revenue after opening week
Key
influencers • Pre-launch search volume
• Opening-week advertising
• Weekly advertising
• Weekly online search activity
Implications • Monitoring pre-launch search
activity can help managers
better predict opening-week
revenues of movies.
• The time-series of pre-launch
search volume of a movie can be
used to develop efficient pre-
launch advertising schedule for
the movie.
• For the purpose of predicting weekly
revenue, knowing weekly advertising and
search volume is sufficient; weekly blog
volume does not contribute to the predictive
performance.
• Advertising influences revenue without the
help of consumer search; online WOM need
consumer search activity for it to influence
movie revenue.
53
Figure WA.1 Constructing Cross-Sectionally Comparable Search Volume Measure
(a) Weekly Multiplier Associated With Basis Keywords
Zombieland X-Men Origins: Wolverine
(b) Raw Search Indexes and Transformed Search Volume Measures
0
5000
10000
15000
20000
25000W
eekl
y M
ult
iplie
r
Week
imdb
video
tomatoes
weather
windows
hello
lamp
mac os
0
20
40
60
80
100
-60 -50 -40 -30 -20 -10 0 10
Go
ogl
e Se
arch
Ind
ex
Week
0
20
40
60
80
100
-60 -50 -40 -30 -20 -10 0 10
Go
ogl
e Se
arch
Ind
ex
Week
0
100000
200000
300000
400000
-60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10
Tran
sfo
rmed
Mea
sure
Week
X-Men Origin: Wolverine Zombieland