SITUATIONS MATTER: UNDERSTANDING HOW ...SITUATIONS MATTER: UNDERSTANDING HOW INDIVIDUAL BROWSING...
Transcript of SITUATIONS MATTER: UNDERSTANDING HOW ...SITUATIONS MATTER: UNDERSTANDING HOW INDIVIDUAL BROWSING...
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 113
SITUATIONS MATTER: UNDERSTANDING HOW INDIVIDUAL BROWSING
SITUATION ROUTINENESS IMPACTS ONLINE USERS’ ADVERTISEMENT CLICKS
BEHAVIOR
Xianghua Lu
School of Management, Fudan University
Shanghai 200433, China
Yuxin Chen
NYU Shanghai
1555 Century Avenue, Shanghai 200122, China
ABSTRACT
Targeting the most valuable users and generating advertisement clicks from them is a veritable challenge for
online advertisers. Most of the extant research about behavioral targeted advertising focuses on matching
advertisements with the right people by profiling users’ latent interests. This study posits that users’ individual
browsing situations should also be examined with relation to the advertisement clicks. We introduce a new angle—
browsing situation routineness—to explore the impact of browsing situations on users’ advertisement clicks.
Supported by a unique clickstream data set, our empirical study indicates that consumers are more likely to click the
advertisements when they visit websites with high content routineness (CR), particularly in leisure-oriented websites.
However, time routineness (TR) works in the opposite manner; users tend to click fewer advertisements during their
routine browsing time, compared to non-routine browsing times. We also find that CR and TR have a complementary
joint effect to increase click-through rate. The present study contributes an alternative dimension—targeted advertising
based on individual browsing situation routineness—to research and practices about behavioral targeted advertising.
Keywords: Targeting; individual; Browsing behavior; Routineness; Online advertisement
1. Introduction
Online targeted advertising has been increasingly used to deliver advertisements to users who are most likely to
be influenced by them. Examples of recent innovations in online targeted advertising include behavioral targeting
(BT) (matching advertisements to a user based on her/his historic online behavior data), contextual targeting (matching
advertisements to a user based on the web content that she/he is viewing), and IP-based targeting or geo-targeting
[IAB 2010; Lambrecht & Tucker 2013]. In particular, BT has become the most popular approach to match
advertisements with the right people by using consumers’ collected behavioral data [Yan et al. 2009]. Various BT
approaches have been developed, such as retargeting [Lambrecht & Tuck 2013], browsing frequency-based targeting
[Lu et al. 2016], interests label system-based targeting [Lian & Xu 2018], and life stage mining methods [Bahari et al.
2015], to match advertisements with the right people by exploring users’ latent interests. While these extant BT
algorithms focus more on users’ personal profile, this study posits that users’ individual browsing situations also
matter on BT.
Research indicates that the situation around us is constantly affecting how we think and act [Sommers 2012;
Cialdini 2008], contexts of all types—where we are, whom we are with, and even whether we are in a hurry—change
our cognitive ability, and influence our behavior. Taking online advertisements as an example, the Wall Street Journal
(2016) reported that mobile and social-media advertising start displaying advertisements at a very unusual time slot,
late at night1. The browsing situation at midnight is perceived as a relatively quiet and minimally interruptive personal
time for most users. Would users click more advertisements during this period of inactivity compared to routine times?
In this study, driven by the importance of the browsing situation, we investigated users’ behavioral responses to the
advertisements in different situations. In particular, we were interested in the perspective of browsing situation
1 It is largely due to the increasing number of people staying awake at midnight in the current “always online” age.
Lu & Chen: Situations Matter
Page 114
routineness, which captures the online users’ familiarity on a specific browsing situation, and examines its relationship
with users’ advertisement click behavior.
Garfinkel (1964: 244) defined routineness at the individual level is the activities by which “persons discover,
create, and sustain standardization”. By definition, routineness means low variability and high predictability [Perrow
1967]. Behavioral data collected by online intermediaries can provide us with a unique empirical context to distinguish
users’ routine browsing situations from their non-routine situations, and enable us explore how the browsing situation
affects users’ click behavior. The authors of this study developed two measurements for individual browsing situation
routineness: browsing content routineness (CR) and time routineness (TR). Unlike the extant behavioral targeting that
pays more attention on what users browsed in the past, CR and TR extend the browsing behavior into where and when
the browsing behavior happens, and provide more perspectives for the behavioral targeting advertising research and
practices. Our empirical results indicate that consumers are more likely to click the advertisements on websites with
high CR, particularly in leisure-oriented websites. However, the TR works the opposite way: users tend to click more
during non-routine times. We also find that TR and CR have a complementary joint effect that increases click-through
rate (CTR).
This study offers several implications for research and practice. First, we provide a new angle (i.e., individual
browsing routineness) for online behavioral targeting research and practice. Browsing situation routineness can
capture the individual browsing context of each user for every advertisement’s exposure, thereby allowing advertisers
to link the right advertisements with the right browsing situation. Advertisers may improve their targeting precision
by combining this new perspective and achieve higher revenue from online advertisements. Second, traditional BT
practices emphasize matching advertisements with users’ latent interests profiles. The authors of this study observed
that high individual browsing situation routineness can improve the CTR without involving any complicated interests
profile computing and matching process. This result demonstrates that browsing situation creates click value for online
advertisements. This can provide the advertisement platforms or content websites with a new pricing method to display
advertisements by analyzing their users’ browsing routineness. Third, most published BT studies have focused on
targeting the right people; the appropriate times to display the advertisements have seldom been considered. In this
study, we fill this gap by investigating the effects of users’ browsing TR on CTR. The findings indicate that
advertisements that reach the audience during non-routine times can have higher performance than those that target
TR.
The paper is organized as follows: In Section 2, we present an overview of literature that is relevant to our
research. Section 3 introduces the research context and the operationalization of browsing situation routineness.
Section 4 presents our empirical model and our analysis of the results, followed by the robustness analysis at Section
5 and a discussion of theoretical and practical implications of this research at Section 6.
2. Literature Review
This study is relevant in two streams of literature: studies about online BT and studies about situational factors.
In addition, we introduce the optimal stimulation level theory to explain why situational factors influence users’
behavior.
2.1. BT
BT refers to approaches where online advertisements are delivered to a user based on their historic online behavior
data. Advertising intermediaries use consumer-tracking technologies, such as cookies, beacons, and video cookies, to
record consumers’ online behavioral information. According to a study by The Wall Street Journal, the 50 most
popular websites in the United States installed more than 3,000 tracking files on a single computer [Angwin 2010].
Data are collected to track most aspects of users’ online behavior; pages visited, browsing trails, searches, and click-
troughs to predict users’ interests.
Yan et al. (2009) examined whether and how BT could help online advertising using real-world advertisement
click-through logs. They segmented users based on their past browsing and search behavior and compared
advertisement responses across segments. Their results suggest that if BT is used, CTRs of online advertisements can
be improved by as much as 670%. Goldfarb and Tucker (2011) noted that consumers’ purchase intent was lower after
viewing online display advertisements in the countries that enforced privacy regulations that restricted advertisers’
ability to collect consumer data used for targeted advertising. These results prove the value of consumer data for BT.
Hoban and Bucklin (2015) conducted an online randomized field experiment and noted that targeted advertising
positively affects visitation to the firm’s website for users in most stages of the purchase funnel. Moriguchi et al.
(2016) later found that targeted advertising is effective in the lower purchase funnel, such as the search and evaluation
stage, but ineffective in the upper funnel, where customers only browse the products.
The aforementioned studies empirically illustrated that overall BT could have a favorable effect on consumers’
actions. However, the effects of BT are heavily dependent on the algorithms used by advertisement platforms. Trusov
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 115
et al. (2016) pointed out that the collection of consumer information is easy but the extraction of meaningful behavior
patterns from such data remains a largely open question. Most of the current BT methods use visit frequency to proxy
users’ interests or familiarity with a certain area, and attach a relevant label on users [Lu et al. 2016]. For example, if
customer A has browsed automobile-related content 10 times and customer B has only browsed it 3 times, customer
A would be defined as “interested in cars”, whereas customer B may not. Therefore, in the future, customer A is more
likely to be exposed to automobile-related advertisements than customer B.
However, this browsing frequency-based targeting strategy could be biased as it ignores users’ overall browsing
behavior. Taking customer A and customer B as examples again, if customer A’s total browsing is 1,000 pageviews,
then 10 visits to automobile-related pages occupy 1% of customer A’s browsing history. Suppose customer B viewed
100 pages but visited automobile-related pages 3 times. Therefore, automobile-related pages occupy 3% of customer
B’s browsing history. From this perspective, customer B could have higher familiarity or routineness towards
automobile than customer A (3% > 1%). Therefore, ignoring the overall pattern of users’ browsing behavior could
cause misleading results when predicting users’ true interests. In addition, pushing the advertisements that are relevant
with users’ past behavior or interests may heighten users’ awareness of online tracking and privacy violations. Users
tend to have stronger reactions when they view advertisements that are closely related to their past behavior [Lu et al.
2016].
Due to the problems related to frequency-based BT, some increasingly comprehensive targeting algorithms have
been developed. Bleier and Eisenbeiss (2015) proposed a two-dimensional conceptualization of advertisement
targeting: targeting depth defines how closely the advertisement reflects a consumer’s interests, and targeting breadth
determines how completely the banner reflects these interests. The consumers’ click-through intentions with
advertisements targeting depth and breadth were examined. Trusov et al. (2016) developed a correlated topic model-
based algorithm to uncover individual user profiles from online browsing data, which could summarize a consumer’s
interests and preferences efficiently.
Although the aforementioned studies sought to profile users’ latent interests precisely and comprehensively for
the purpose of BT, BT research is still at its nascent stage. This study proposes that aside from users’ inherent interests
or personalities, the individual browsing situation of each user when they are exposed to the advertisements would
also affect advertisement click behavior. Therefore, in this study, we will extend the literature about online
advertisements targeting from a new angle (i.e., the individual browsing situation), and examine how this new
perspective might influence users’ advertisement click behavior.
2.2. The Impact of Situational Factors
Situational factors are described as “factors particular to a time and place of observation…and which have a
demonstrable and systematic effect on current behavior” [Belk 1975]. Situational factors can be classified into five
groups: physical surroundings; social surroundings; temporal perspective; task definition; and antecedent state [Belk
1975]. These factors can temporarily alter users’ preferences, attitudes, or intentions, thereby changing their permanent
perspective into a situational perspective, and therefore, transforming their behaviors [Simon & Usunier 2007]. For
example, a user who is not in a hurry may react quite differently to a service encounter than a user who is feeling
rushed. By including factors specific to particular times and places, advertisers may be able to fully understand the
motivations and behaviors that influence users’ advertisement click behavior.
The importance of situational factors on decision making or human behavior has been explained in numerous
theories, such as the organizational contingency theory [Fiedler 1964] and situational leadership theory [Hersey &
Blanchard 1969], which claims the optimal course of action is contingent upon the internal and external situation. In
research about individual decision making, situational factors were also found to have a constant effect on how we
think and act [Sommers 2012; Cialdini 2008; Li et al 2016]. Some browsing situations even provide greater predictive
power than demographic or personal factors alone [Belk 1975; Stanton & Bonner 1980; Sommers 2012]. For example,
in a study about helping behavior, Darley and Batson (1973) noted that after controlling participants’ dispositional
factors, only 10% of participants in situations with limited time would offer help, whereas 64% of participants would
offer help in situations with unlimited time. In another study about the adoption of self-service technology at retail
stores, Collier et al. (2015) noted four situational factors that affected user behavior: order size, wait-time tolerance,
location convenience, and employee presence. However, in research about online targeted advertising, to the best of
our knowledge, only a few studies have incorporated situational factors.
Contextual targeting may partially take into account the context of advertisements displayed by delivering online
advertisements based on the web content that users are viewing [Goldfarb & Tucker 2011; Zhang & Katona 2012].
The assumption of contextual targeting is that users who browse a web page about automobiles are more likely to be
interested in cars than those browsing a web page about health care. However, this type of contextual targeting entirely
neglects users’ past experiences, because automobile advertisements will be displayed to all users that visit automobile
web pages; such a policy is not sufficient to capture users’ heterogeneity. Different people may perceive and respond
Lu & Chen: Situations Matter
Page 116
to the same situation differently because of their discrepant backgrounds and experiences. Therefore, in this study, we
propose a new angle called the individual browsing situation, which attempts to integrate users’ individual experiences
with their situation, and examines how this individualized browsing situation will affect users’ motivations and
advertisement click behaviors.
Based on the features of online targeted advertising, we operationalize the particular individual browsing situation
from two aspects: browsing CR and browsing TR. Routineness is defined as “repetitive, recognizable patterns of
interdependent actions, carried out by actors” ([Feldman & Pentland 2003]. Routineness is a direct reflection of
experience because it is the learning brought about by the repetition of similar problem-solving actions [Freeman &
Hannana 1983]. By definition, routineness indicates low variability and high predictability [Perrow 1967]. In this
study, we use this concept to proxy users’ familiarity for a particular browsing situation. The routineness value of each
browsing is calculated by the similarity of new browsing situations with users’ past browsing experience.
Nonetheless, routineness is not a static concept. Sacks (1984) described the remarkable lengths to which people
go to believe that even the most calamitous situations (e.g., airplane hijackings) are ordinary. In the context of our
research, if a user always stays awake at midnight, the browsing behavior at midnight is not considered non-routine.
Therefore, each user may have their own individual browsing situational routineness value at a specific place and
time, and this value always changes over time.
2.3. The Optimal Stimulation Level Theory
The optimal stimulation level theory could be used to illustrate the underlying rationale that individual browsing
situational routineness can affect users’ advertisement click behavior. The optimal stimulation level theory postulates
that individual behavior is influenced by the intrinsically motivated desire to reach a specific level of stimulation,
which is termed “optimum stimulation level” [Berlyne 1960]. Consumer research has established that this level of
optimum stimulation varies between individuals [Raju 1980]. When the stimulation derived from the environment is
too low, individuals will attempt to increase stimulation. When the stimulation is too high, individuals will attempt to
decrease stimulation. Psychological pleasantness is highest at the optimal stimulation level, the level of stimulation at
which a person feels most comfortable [Steenkamp & Baumgartner 1992].
Behavior, such as risk-taking, variety-seeking, or curiosity-motivated behavior, aimed at modifying
environmental stimulation to the optimum level of stimulation, is called “exploratory behavior”. In numerous settings,
it has been noted that these general explorative behaviors affect the willingness to choose new or familiar products on
a repurchase occasion, to click repeated advertisements [Baumgartner & Steenkamp 1996], or to seek extra
information [Raju 1980].
In the context of our research, we use this theory to explain how high or low browsing situation routineness affects
users’ advertisement click behavior. We postulate that when users are in highly routine situations (environments)
where they feel that the situation is familiar, the stimulation from this new situation could be low; therefore, users may
like to seek more variety, or become more curious about the information provided by advertisements, and therefore,
be more likely to click the advertisements. Whereas in non-routine situations, users’ environmental stimulation may
already be very high; therefore, users will be less likely to click the advertisements to search extra information. Thus,
we postulated the following hypotheses:
H1: Users are more likely to click advertisements on the high browsing CR web pages than those on the lower
browsing CR web pages.
H2: Users are more likely to click advertisements during the high browsing TR periods than the lower browsing
TR period.
H3: Browsing CR and TR have a positive complementary effect in terms of the increase in click probability for
advertisements.
3. Data
3.1. Research Context
The aim of this study is to explore the impact of browsing situation routineness on users’ click behavior. We
obtained data from one of the largest online advertising intermediaries in China. This company provided three months
[from 1 May 2014 to 31 July 2014] of website clickstream data from 139,000 randomly drawn users. When users
browsed the network websites of this company, relevant advertisement was displayed on the webpage. The contents
of advertisements are various, range from automobile, cosmetics, FMCG et al. Very few optimization algorithm was
adopted during our data collection period, therefore, the same ads would be displayed if users visit the same website
at the same time. The intermediary company collected the information of each advertisement display, includes users’
cookieID, browsing time, browsing web page URL that advertisements were displayed on, advertisement IDs, web
domain, and advertisement click behavior. These users have generated a total of approximately 803,000
advertisements display and click observations.
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 117
We excluded the website browsing observations occurred in mobile devices (less than 2% of the whole dataset)
because users may have different click proneness when using mobile phone to browse online advertisement. In
addition, to ensure each user has sufficient browsing history, we only keep users who have more than 15 observations
in our dataset, and we eventually have about 6,000 users with total 422,000 observations, which every user had
approximately 72 observations.
A unique feature of this dataset is that the intermediary conducted a keyword frequency count for each web page,
therefore, we have the keywords’ information for each URL. For example, for the web page “http://2014.ifeng.com/”,
the intermediary extracted 10 keywords and their frequencies, (i.e., Germany: 20; Soccer:4; Wechat: 3; Wechat
Moments: 2; Beijing: 2; Columbia: 1; Weibo: 1; Music: 1; Japan: 1; Europe: 1). These keywords allowed us to measure
the CR of each user’s individual browsing situations, which we elaborate in the following section.
3.2. Data Pretreatment
From the 146,516 web pages in the data set, 2,233 unique keywords were extracted. With these keywords, we
were able to aggregate web pages into some groups by calculating the content similarity of each web page. Ideally,
we would directly cluster these web pages with these 2,233 keywords. However, due to numerous missing values, this
146,516 × 2,233 matrix was a very sparse matrix; the clustering results would have been biased if we had conducted
the clustering based on this sparse matrix. Therefore, we conducted two steps of pretreatment before the clustering:
First, in the web page dimension, instead of focusing on individual web page levels, we used the domain
information as the analysis level. The domain information of the browsed URL is usually included in the clickstream
data. In our case, approximately 172 web domains were involved. Web domains include certain channel information;
for example, the website ifeng.com has more than 10 domains, such as news.ifeng.com and sports.ifeng.com. An
analysis at the domain level may help us aggregate the web page into a higher level without losing too much
information.
Second, for the dimensions of keywords, we combined the 2,233 keywords into 352 keyword groups. We intended
to combine keywords with the same meaning but different expressions. For example, terms such as Germany and
German were combined into one keyword. We also combined the names of movie stars as one group. Three MBA
students were hired to perform this job and were asked to group the keywords based on their own understanding.
During the grouping task, for any keywords that were ambiguous or controversial, we used a voting mechanism to
decide the final results. Finally, we had 352 unique keyword groups.
From this new 172 × 352 matrix, we adopted the term frequency–inverse document frequency (TF-IDF) algorithm
to compute the similarity of keywords across each web domain, and the Gaussian mixture model (GMM) generated
39 content groups from these web domains. Appendix 1 presents the algorithm used to cluster the web pages. Table 1
lists the typical web domains that belong to the top six content groups. We label these six groups with a possible tag
for further discussion.
Lu & Chen: Situations Matter
Page 118
Table 1. Clustering Results of the Top Six Content Groups
Rank of
content
groups
Examples of included domains Possible tag
Group1
tv.67.com;news.67.com;gb.cri.cn;music.67.com;emotion.miss-no1.com;
mz.mop.com;news.eastday.com;big5.chinabroadcast.cn;enjoy.eastday.com
;tt.mop.com;www.gucheng.com;www.xici.net;big5.cri.cn;
roll.eastday.com;baidu.baomihua.com;joy.online.sh.cn;society.eastday.com
;star.67.com;yule.jinti.com
Entertainment
related
Group 2
www.quanjing.com;quanjing.com;sh.eastday.com;www.ddmap.com;
history.eastday.com;sheying.eastday.com;beijing.jinti.com;changsha.ddmap.com
Location
related
Group 3
newcar.xcar.com.cn;auto.163.com;topic.qc188.com;www.qc188.com;
info.xcar.com.cn;iask.qc188.com;config.qc188.com;drive.xcar.com.cn;
www.3quan.com;news.52che.com;club.qc188.com;auto.cri.cn;
review.qc188.com;video.qc188.com;gai.qc188.com;auto1.cri.cn
Auto related
Group 4
health.miss-no1.com;iask.qc188.com;enjoy.eastday.com;pk.qc188.com;
shenzhen.jinti.com;mil.eastday.com;www.ddmap.com;photo.jschina.com.cn
;yule.jinti.com;collection.aili.com
Virtual
community
Group 5
2014.ifeng.com;pinglun.eastday.com;finance.gucheng.com;
www.etnet.com.cn;emotion.mop.com;mil.eastday.com;column.etnet.com.cn
;forex.gucheng.com;world.eastday.com;pic.miss-no1.com;cat.abang.com
Economy and
finance related
Group 6
www.miss-no1.com;beauty.aili.com;lifestyle.aili.com;health.aili.com;
zhuangban.onlylady.com;love.aili.com;celeb.aili.com;fashion.aili.com;
bbs.aili.com;auto.aili.com;cosmetics.aili.com;family.aili.com;art.aili.com
Woman related
It should be noted that each domain does not exclusively belong to one cluster. We computed the probability of
each domain belonging to these 39 content groups, and got a 172 × 39 probability matrix 𝑃(𝐷𝑚 ∈ 𝐶𝑘)(𝑚 =1,2 … 172, 𝑘 = 1,2 … 39) after the GMM calculation. The probability vector for the domain “health.miss-no1.com”
is presented in Table 2 as an illustrative example. Using the probability matrix in the clustering process enabled us to
capture the different content perspectives for a specific domain, and preserve more information than the dummy
variable method2. For the purpose of comparison, we also conducted a robustness analysis of the dummy variables.
Table 2. Probability Vector of the Web Domain
domain V1 V8 V2 V6 V3 V22 V5 V31 V33
health.miss-
no1.com 0.00074 0.03784 0.00003 0.00001 0.00004 0.95763 0.00123 0.00240 0.00009
3.3. The Operationalization of Variables
We used two variables to measure the routineness of users’ browsing situation: CR and TR. CR measures the
degree of content familiarity when users browse a new web page. TR measures the frequency with which users
routinely browse websites during a specific time slot. We also measured the users’ non-routineness behavior by coding
whether it is the first time that users visited this type of content, and whether it is the first time that users browsed the
Internet at this time slot.
We used the data from the first month (2014 May) as the historic information baseline for the calculation of June
and July’s routineness value. The routineness value of May could be biased without the historic data; therefore, we
did not examine the relationship between the routineness value of May with users’ click behavior, only the data from
June and July were used in our empirical analysis.
2 In the case of health.miss-no1.com, we set the dummy v22 = 1 and other 38 content groups as 0 under the dummy variable
coding method.
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 119
3.3.1. Individual Content Routineness of Browsing Situation
Let 𝐶𝑘 denote the kth content group (𝑘 = 1,2,3, … ,39), with i as the user. Based on the data from May 2014, we
could have 𝜃𝑖𝑘 =∑ 𝑁𝑖𝑑∗𝑃(𝐷∈𝐶𝑘)172
𝑑=1
𝑁𝑖 as the ratio of user i’s browsing frequency at the kth content group among all their
browsing. 𝑁𝑖𝑑 is the number of visits on domain D, and 𝑁𝑖 is the total visit time during May. 𝑃(𝐷 ∈ 𝐶𝑘) is the
weight to adjust the value for each domain on 39 content groups. The matrix 𝜃𝑖𝑘 represents user i’s share of visits
across 39 content groups in May; we used this matrix to proxy users’ baseline familiarity on each content groups.
When user i browses a new web page in June and July, we compute a CR value to reflect how familiar this user
is with this web page’s content. If user i has high familiarity with one specific content group k with high 𝜃𝑖𝑘, and
simultaneously, the web page domain they are visiting has high relevant probability on content group k with higher
𝑃(𝐷 ∈ 𝐶𝑘), then user i should have a high CR value for this browsing situation. We set the CR value for 𝐶𝑅𝑖𝑚 as:
𝐶𝑅𝑖𝑚𝑡 = ∑ 𝑃(𝐷𝑚𝑡 ∈ 𝐶𝑘) ∗ 𝜃𝑖𝑘39𝑘=1 (1)
where m is the domain ID that user i is browsing at time t. For example, if user i’s baseline familiarity on three
content groups (i.e., 𝜃𝑖𝑘) are (0.1, 0.6, 0.4) based on the data from May 2014. This user visits two domains, A and B,
on June 2014, and the relevant probability of A and B on these three content groups are (0.2, 0.4, 0.4) and (0.1, 0.8,
0.1), respectively. With the aforementioned equation, user i’s routineness value on domain A is (0.2, 0.4, 0.4) × (0.1,
0.6, 0.4)’ = 0.43, while their routineness value on domain B is (0.1, 0.8, 0.1) × (0.1, 0.6, 0.4)’ = 0.53. Therefore, user
i has higher CR on domain B than domain A.
After each calculation of routineness value, we also need to update 𝜃𝑖𝑘 to incorporate user i’s new browsing
behavior.
𝜃𝑖𝑘𝑛𝑒𝑤 =
∑ 𝑁𝑖𝑑∗𝑃(𝐷∈𝐶𝑘)172𝑑=1
𝑁𝑖+1 (2)
For the next round of the routineness value calculation of user i, we need replace 𝜃𝑖𝑘 with this updated 𝜃𝑖𝑘𝑛𝑒𝑤
in Equation (1).
With the aforementioned algorithm, we can compute a unique CR value for each user on each new browsing
situation in June and July.
3.3.2. Individual Time routineness of Browsing Situation
The TR of browsing behavior measures the frequency with which a user routinely browses websites during a
specific time slot. In this study, we set each hour as a time slot Hh (h = 1,2….24). We used the data from May to
calculate the baseline TR for each user at each hour, the vector 𝜋𝑖ℎ =𝑁𝑖ℎ
𝑁𝑖 (h = 1,2….24) captures user i’s browsing
situation routineness distribution among all their browsing during 24 hours.
We use Equation (3) to calculate the TR value for June and July’s observation. If user i visits a web page at time
t (belongs to time slot 𝐻𝑡), we retrieve the 𝜋𝑖ℎ vector for each user, and compute the distance between t with h to
proxy the probability of t belonging to time slot h.
𝑇𝑅𝑖𝑡 = ∑ 𝑃(𝑡~ℎ) ∗ 𝜋𝑖ℎ23ℎ=0 (3)
In Equation (3), we set up the probability function 𝑃(𝑡~ℎ) to compute the distance between t and h.
𝑃(𝑡~ℎ) =exp (−β𝑑ℎ)
∑𝑚exp (−β𝑑ℎ) (4)
where h = 0,1,…23, Σm is the summation of m = 0,1,…23, dh is the minimum distance between h + 0.5 and t
with hour as the unit, and β > 0. For example, if t = 0:10, d0 = 0.5 − 1/6 = 1/3 (10 minute equals to 1/6 hour), d1 = 1.5
− 1/6 = 1 1/3, d2 = 2.5 − 1/6 = 2 1/3,… d11 = 11.5-1/6 = 11 1/3, d12 = 11.5 + 1/6 = 11 2/3, d13 = 10.5 + 1/6 = 10
2/3,… d23 = 0.5 + 1/6 = 2/3. With the aforementioned specifications, P (t~h) is a probability measure with Σh P (t~h)
= 1. β is the weight of the distance between t on each h. When β is larger, the higher weights are on closer time slots.
We assume β = 1 to begin with and then check β = 0.5 as robustness check.
After computing the TR value 𝑇𝑅𝑖𝑡, we update 𝜋𝑖ℎ again for the next round of calculation of user i’s TR value.
𝜋𝑖ℎ𝑛𝑒𝑤 = {
𝑁𝑖ℎ
𝑁𝑖+1 𝐻𝑡 ≠ ℎ
𝑁𝑖ℎ+1
𝑁𝑖+1 𝐻𝑡 = ℎ
(5)
3.3.3. Non-Routineness Measurement of Browsing Situation
The aforementioned two variables capture the individual routineness for each browsing situation. For the purpose
of comparison, we also coded two non-routineness variables: hourfirst and webfirst. Sommers (2012) indicated that
being the first instance is a good measurement of an exceptional situation. Users tend to be more cautious when they
Lu & Chen: Situations Matter
Page 120
are in an exceptional situation, and are less likely to click the advertisements to increase their stimulation level. We
used these two variables to capture whether users would have different advertisement click behavior in non-routine
browsing situations. The variable hourfirst indicates whether user i visited this web domain for the first time at time
slot h. The variable webfirst indicates the first time user i visited a web domain from a content group that they have
visited at time t.
3.3.4. Control variables
In addition to our focal variables, CRit, TRit, and two non-routineness variables, we included several groups of
control variables. The first group of variables indicated users’ past browsing frequency on this type of web domain.
Chatterjee et al. (2003) and Braun and Moe (2013) have indicated that consumers might respond differently when
they visit a website for the first time or after several times. Multiple exposures may enhance the impression of
advertisements and lead to a higher click rate. Therefore, we included two variables, ImpCidVisitit and ClkCidVisitit,
into our model as users who repeatedly viewed or clicked on an advertisement might respond differently to those who
occasionally saw it. ImpCidVisitit is the total previous frequency of user i for the web domain they are browsing at
time t. ClkCidVisitit is the total previous click frequency of user i for the web domain they are browsing at time t.
The second group of variables was the visitors’ proneness to clicking on the advertisements. We characterized a
visitor’s inherent disposition to click using the variable adClickProneid(t), which is measured by the total clicks of
user i divided by their total visits in content group d(t) when they are browsing at time t. It is an aggregated variable
that we combine the first month click and browsing behavior of each of the 39 content groups together to reflect users’
click tendency.
We also coded two user-related variables to reflect users’ idiosyncratic characteristics. One variable is Cityid,
whereas the other variable is the browser that the user is using to browse the website. We set Cityid = 1 if a user was
located in a big city such as Beijing or Shanghai; and Cityid = 7 if a user was located in a small village3. According to
a survey, users’ preferred Internet browsers can indicate their age; Internet Explorer users are usually older, whereas
Chrome users are usually younger: (https://www.elie.net/blog/web/survey-internet-explorer-users-are-older-chrome-
seduces-youth). Therefore, we coded browser = 0 to represent Internet Explorer and browser = 1 to represent Chrome
and Firefox, whereas browser = 2 represented Safari and Opera. Table 3 compiles the definitions of the main study
variables.
Table 3. Definitions of the Main Variables
Variable Type Variable Name Definition
Dependent
variable Clickit
If user i clicks the advertisements on the web page that they are
browsing at time t, Clickit = 1, otherwise Clickit = 0
Independent
variables
CRit The CR of user i when they are browsing a web page at time t
TRit The TR of user i when they are browsing a web page at time t
hourfirstit The non-routineness value of user i for time slot 𝐻𝑡
Webfirstit The non-routineness value of user i for the web domain they are
browsing at time t
Control
variables
ImpCidVisitit The browsing frequency of user i for the web domain they are
browsing at time t
ClkCidVisitit The advertisement click frequency of user i for the web domain they
are browsing at time t
𝑎𝑑𝐶𝑙𝑖𝑐𝑘𝑝𝑟𝑜𝑛𝑒𝑖𝑡 The advertisement click proneness of user i for the content group
they are browsing at time t
User
characteristics
Cityidi Cityid = 1 if a user is located in a big city, and Cityid = 7 if the user
is located in a small village
Browersi 0 represents IE, and 1 represents Chrome and Firefox, whereas 2
represents Safari and Opera
3.4. Descriptive Summarization
Table 4 reports the descriptive summary for the main study variables. We observed that, on average, these users
click 7.1% of the advertisements displayed to them. The average CR is 0.62, which suggests that 62% of the browsing
content is similar with those in the user’s browsing history; this finding is consistent with that of research suggesting
users’ browsing behavior maintains a stable pattern (Montgomery, 2000). The average TR of consumers is much
3 The city code information was retrieved from http://j.news.163.com/docs/17/2016031611/BI9GLPKE05248EIC.html
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 121
lower, only 13.07% of the browsing time has similar routineness, indicating that an individual’s browsing time varies
significantly each time.
Table 4. Descriptive Summary of the Main Variables
Mean SD Min. Max.
Click 0.0708 0.2565 0 1
CR 0.6204 0.3700 0 1
TR 0.1307 0.0937 0 0.462
Webfirst 0.161 0.367 0 1
Hourfirst 0.337 0.47 0 1
ImpCidVisit 63.2607 150.8246 1 7358
ClkCidVisit 2.9385 10.7428 0 276
adClickProne 0.0330 0.0930 0 1
Browser 0.7723 0.4656 0 2
CityID 3.2878 2.2662 0 7
Table 5. Correlation Matrix of the Main Variables
Click CR TR Hourfir
st
webfirst
t
Imp_ci
dvisit
Clk_cid
visit
adClick
Prone
Browse
r CityID
Click 1
CR 0.0999 1
TR 0.0448 0.0518 1
Hourfirst 0.0717 0.0965 0.3126 1
Webfirst -0.061 -0.5431 0.0303 -0.2915 1
ImpCidVisit 0.152 0.27 -0.0507 0.266 -0.1867 1
ClkCidVisit 0.1443 0.2444 -0.0517 0.236 -0.1811 0.8874 1
adClickProne 0.467 0.0195 0.0393 0.054 -0.0137 0.0707 0.0589 1
Browser -0.023 0.0616 -0.0474 -0.0834 -0.0169 -0.0387 -0.0314 -0.015 1
CityId 0.0014 -0.0958 0.0325 -0.0622 0.0791 -0.1264 -0.1168 -0.0003 -0.0032 1
Figure 1 presents the frequency distribution of browsing time; 8:00 AM and 9:00 AM are the peak times when
users browse the Internet. This finding is consistent with that of Internet traffic analysis reports (Brouwer, 2014)4,
mostly because this period is usually when we prepare to start work.
Figure 1. Browsing Frequency Distribution of Browsing Time
4 http://insights.wired.com/profiles/blogs/linking-browsing-patterns-and-user-conversions#axzz4LcZyBf4l
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3
May June July
Lu & Chen: Situations Matter
Page 122
4. Research Model and Estimation Results
We used a random-utility framework to model a user’s click decision. A user clicked on an advertisement when
their clicking utility exceeded a threshold level. In particular, for user i viewing the web page at time t, we used Clickit
to denote the user’s click binary response on the advertisements that displayed on this web page and uit to denote the
latent value. In the utility equation, in addition to the independent and control variables introduced earlier, we also
controlled the fixed effect of the content group by including the dummy variables ( tCgroup ), because users’
advertisement click tendency is not the same across different content groups. For example, research has indicated that
users are more likely to click advertisements on entertainment websites compared to economy-related websites
(Brouwer, 2014). The fix effect of 24 hour slot (ht) is also controlled to eliminate the impact of time on click
probability. it represents the random error that follows the type-I extreme distribution. The click equation is defined
as:
*0 1 2 3 1 2
3 4 7 t
1[ 0],
Im h
it it
it it it it it it it
it it i t it
Click u
u CR TR CR TR Hourfirst webfirst
pCidVisit ClkCidVisit Clickprone Cgroup
4.1 Main Results
Table 6 reports the main results of our logit regression model. Model 1 is the baseline without considering the
effects of users’ routineness variables. Model 2 is the main model that did not include users’ idiosyncratic variables.
Model 3 is the full model. We observed that the full model increased the hit rate of users’ click behavior from 18.99%
to 19.45%, suggesting that including individual browsing routineness variables can improve the prediction power.
Table 6. Logit Regression Results and Model Comparison
VARIABLES Model 1 Model2 Model 3
Routineness
variables
CR 0.710*** 0.710***
(0.0440) (0.0441)
TR -1.179*** -1.278***
(0.212) (0.212)
CR*TR 2.175*** 2.234***
(0.244) (0.244)
Non-routineness
variables
webfirst -1.353*** -1.080*** -1.078***
(0.0334) (0.0346) (0.0346)
hourfirst 0.241*** 0.261*** 0.273***
(0.0146) (0.0151) (0.0151)
Control
variables
ImpCidVisit 0.000612*** 0.000524*** 0.000583***
(3.46e-05) (3.50e-05) (3.53e-05)
ClkCidVisit 0.0115*** 0.0110*** 0.0112***
(0.000437) (0.000437) (0.000439)
adCliclprone 11.68*** 11.79*** 11.77***
(0.0582) (0.0587) (0.0589)
Users
demographic
variables
Browser dummy Included
Cityid dummy Included
GroupID Included Included Included
Hour fixed
effect
Included Included Included
constant -7.673*** -8.123*** -8.153***
(0.190) (0.193) (0.198)
Observation 801,998 801,998 801,998
Likelihood -140918.12 -140762.02 -140479.7
Hit rate(p=1) 18.99% 19.24% 19.45%
Standard errors in parentheses*** p < 0.01, ** p < 0.05, * p < 0.1
The results of Model 2 and Model 3 indicate a similar pattern. We interpret our results mainly based on Model
3. The results show that the coefficient of browsing CR was significantly positive, which suggests that users are more
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 123
likely to click the advertisements if they are browsing higher CR websites. More specifically, with one unit increase
in users’ CR, the probability of users click on an advertisement will increase by 67% (= exp (0.710)/(1 + exp (0.710))).
This result echoed by the negative and significant non-routineness content browsing variable (webfirst); users are less
likely to click the advertisements for the web domain they first visited compared to the domains they have visited
before, thereby H1 is supported. This result is consistent with the optimal stimulation level theory that users tend to
be more active to seek new information or click advertisements when they are in an environment where stimulation is
low. However, in a new situation, where the stimulation level is already high, they are less likely to seek extra
information because when users visit a new web domain, they tend to have a very direct goal, and are, therefore, also
less likely to click the advertisements (Lu et al. 2016).
However, H2 was not supported by our empirical results. We observed that the coefficient of browsing TR was
significantly negative, whereas the coefficient of the non-routineness browsing time variable (hourfirst) was
significantly positive, which suggests that users are less likely to click advertisements during the routine browsing
time, but would be more willing to click if they first browse the Internet during this time slot. Therefore, the optimal
stimulation level does not apply to the click behavior of TR. This could be because, unlike higher CR, it can represent
higher familiarity with the browsing environment. The higher TR does not necessarily lead to low stimulation level.
In other words, time may not be one of the environmental factors that trigger explorative behavior to increase or
decrease the user’s stimulation level. The non-routine browsing time may be a newly available time for users. Users
tend to be less occupied, and therefore, more inquisitive about the new information provided by advertisements. This
result supports the practice that advertisers display advertisements late at night (Wall Street Journal, 2016). Because,
for most users, late night is a non-routine browsing time, and is also available as personal time with relatively few
interruptions. During this period, users are more likely to click advertisements to acquire new information.
The interactive term for CR and TR is significantly positive. This result indicates that even though TR negatively
affects users’ advertisement click behavior, it can enhance the effects of CR on click probability, thus H3 is supported.
This result suggests that if users visit a routine website at a routine time, they are more likely to click the
advertisements, compared to the situation where users are visiting a routine website at a non-routine time. To further
support this argument, we use the variable hourfirst as a proxy of non-routine time. In line with H3, the impact of CR
should be negatively moderated by the non-routineness browsing time variable (hourfirst). We replace CR*TR with
the interactive term between CR and hourfirst, and run the Model 3 again. The coefficient of the new interaction term
is -0.417***(S.E.=0.085), which strongly supports H3 again from the reversed perspective.
For the control variables, users’ past visits and click frequency were assumed to have a significantly positive
effect on their click probability, which suggests that the traditional frequency-based BT algorithm also works in our
study. In addition, the click proneness of a user was positively correlated with their click behavior, suggesting that
users were heterogeneous and click proneness could be a good predictor of advertisement clicks.
4.2 The Differential Effect of Browsing Situation on Content Groups
The estimation result indicates that the fixed effect of content groups plays an important role in users’ click
behavior. Therefore, in this section, we further examine how the content browsing situation of websites moderates the
effect of individual routineness value. Because the top six content groups occupied about 80% of the observations in
this study, we focused our analysis on only these six content groups (636,720 observations left).
We ran an interaction model by adding six interaction terms between the dummy groups and CR; the detailed
estimation results are listed in Appendix 2. Figure 2 presents the coefficient magnitude of each group. The baseline is
the coefficient of CR (0.82). The group 2 and 4 had more benefits from the CR than did other groups, whereas CR
had a negative moderating effect on group 3 and 5.
Lu & Chen: Situations Matter
Page 124
Figure 2. Magnitude of CR’s Impact Across Six Content Groups
As indicated in Table 1, we labelled group 2 and 4 as location-related and virtual community-related web domains;
both groups contain leisure-oriented websites. Figure 3 suggests that users with higher CR are more likely to click
advertisements when they are at these two types of websites, compared to other content websites. For group 3
(automobile-related websites) and 5 (economy and finance-related websites), high CR did not help to increase the
advertisement clicks. This could be because, compared to leisure-oriented groups (i.e., 2 and 4), the content of group
3 and 5 is more function -driven, the stimulation level of these websites is higher than that of virtual communities;
therefore, the role of CR is less important. We thereby postulate that the effects of browsing content routineness might
be more salient in the leisure-oriented websites.
5. Robustness Check
5.1. Excluding Users with Lesser Observations from the Analysis
One of the limitations of our data set is that we only used users’ web clickstream data in our focal intermediary
websites network. Users’ Internet browsing behavior that occurs on other websites outside of this network was found
to be lacking. This may affect the precision of the CR or TR value calculation because part of users’ browsing behavior
history is not available for us. In particular, the situational routineness value of users that had very few observations
in our data set could be biased. In this robustness analysis, we excluded these users and conducted the analysis only
with data from users who had more than 15 observations in our data set, which partially corrected the bias caused by
incomplete observations. The first column in Table 7 reports the result of these 422,244 users. We observed that our
results were consistent with that of the whole sample for users who had relatively complete browsing behavior history.
This suggests that our model is applicable to most users, regardless of whether they have complete or incomplete
browsing behavior history.
5.2. Analysis Excluding the Targeted Observations
In our observations, some advertisements had adopted BT techniques to reach users (the total number of targeted
observations is 41,220, about 5% of our observations). As revealed by the focal intermediary, the company mainly
used frequency-based targeting techniques. For example, as requested by an automobile brand advertiser, its
automobile advertisements only targeted users who have at least visited an automobile-related web page once. The
intermediary would search the database for users who reached this threshold, and display auto advertisements to them
no matter what type of web page they are visiting.
For these targeted users, their click behavior could be affected both by the specific targeting techniques and the
browsing situation. To rule out the influence from the current targeting techniques, we excluded these observations
during the robustness test. The new result of the remaining 760,795 observations is listed in column 3 of Table 7. The
pattern of the results remained unchanged and were very similar to our main results.
5.3. Analysis by Setting Beta = 0.5
While computing the TR, by default, we set beta as 1 to indicate the closeness of t to h, the higher weight it is. In
this robustness test, we set beta = 0.5 to test how the results would change. We observed that the TR mean value
0
0.5
1
1.5
2
2.5
1 2 3 4 5 6
Coefficients
Content Groups
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 125
decreased from 0.13 to 0.098 (0.052). However, the estimation results of the test remained very similar to that of our
model, as indicated in the last column in Table 7.
Table 7. Regression Results for Robustness Checks
VARIABLES Model Exclude
low- visited users
Model Exclude BT
Observations
Model when
beta=0.5
Routineness
variables
CR 0.598*** 0.693*** 0.613***
(0.0546) (0.0456) (0.0529)
TR -0.600** -1.358*** -1.922***
(0.262) (0.219) (0.374)
CR*TR 2.041*** 2.331*** 3.956***
(0.316) (0.251) (0.437)
Non-
routineness
variables
webfirst -0.706*** -1.084*** -1.071***
(0.0486) (0.0358) (0.0345)
hourfirst 0.0958*** 0.274*** 0.283***
(0.0219) (0.0155) (0.0150)
Control
variables
ImpCidVisit 1.06e-05 0.000657*** 0.000588***
(4.10e-05) (3.62e-05) (3.52e-05)
ClkCidVisit 0.0168*** 0.0101*** 0.0112***
(0.000487) (0.000449) (0.000438)
adCliclprone 9.216*** 11.88*** 11.76***
(0.0784) (0.0607) (0.0587)
Users
demographic
variables
Browser dummy Included Included Included
Cityid Dummy Included Included Included
constant -7.403*** -8.414*** -7.871***
(0.258) (0.219) (0.192)
GroupID Included Included Included
Hour fixed effect Included Included Included
Observation 422,244 760,795 801,998
Likelihood -78285.1 -129814.12 -140061.02
Standard errors in parentheses*** p < 0.01, ** p < 0.05, * p < 0.1
6. Conclusion
This study postulates that not only do users’ personal interest profiles matter in BT, but their individual browsing
situations also affect their advertisement click behavior. We developed two measurements to capture individual
browsing situations: browsing CR and TR. Their relationship with users’ click behavior was empirically examined
with a unique dataset. Our results indicate that users are more likely to click the advertisements when they are visiting
high-CR websites, particularly for leisure-oriented websites. However, users tend to click more during non-routine
times than routine time (high-TR). We also observed that TR and CR have a complementary joint effect to increase
CTR.
6.1. Theoretical Implications
The results of this study contribute to the literature about online targeted advertising in several aspects. First, we
developed a new concept, browsing situation routineness, to examine how the situation familiarity would affect users’
click behavior. To the best of our knowledge, most of the current literature on BT focuses on users’ latent interests or
preference profiling. The impact of individual browsing situations on advertisement clicks have not been examined.
This study fills in this gap and provide a new perspective for the behavioral targeting advertising research.
Secondly, we used the optimal stimulation level theory to explain how the situation familiarity would affect users’
behavior. The results indicate that optimal stimulation level theory can only be used to explain the underlying rationale
of CR, but not TR. Users tend to be more inquisitive about the new information when they are less occupied at non-
routine browsing time, which might overshadow the effects of stimulation level. This finding highlights we may treat
the time factors in behavioral targeting in different way. The result also suggests that optimal stimulation theory might
not be applicable for time factors, which enrich this theory with a new perspective.
Thirdly, the complementary joint effect between CR and TR shows that although TR might not be an independent
factor to affect users’ stimulation level, it could enhance the effects of CR on click probability. The interaction effect
Lu & Chen: Situations Matter
Page 126
is consistent with the optimal stimulation level theory that when users are doing routine activities at a non-routine
time, her/his stimulation level is even lower compared to the situation where users doing routine activities at a non-
routine time, thus more likely to click the advertisements. Our study provides an empirical foundation for further
theory exploration about the combined effects of different targeting factors on users’ click behavior.
6.2. Practical Implications
This study also has several practical implications for online advertisers and intermediaries. First, we developed a
methodology to measure users’ individual browsing situations. The algorithm to measure the CR and TR can be
quickly used by other advertisement intermediaries as they are built on the standard clickstream data.
Second, the empirical examination of the relationship between browsing situation routineness and users’ CTR
provides evidence that situation-based targeting can significantly increase CTR. In particular, for the advertisements
displayed on specific websites during a specific period, assigning advertisements to the users with higher CR and
lower TR can directly help intermediaries achieve higher revenue.
Third, we argue that the proposed browsing situation routineness approach may be valuable for numerous
applications beyond BT practices. For example, this new angle may also provide an extra pricing strategy for
advertisements displayed on advertisement platforms or content sites. If the targeted users have high CR values on a
specific website, advertisers on that specific website may need to pay more than they would need to pay on websites
with lower CR values. The situation routineness method can also address the privacy issue of traditional BT because
it is not purely based on users’ browsing history.
6.3. Limitations
This research had some limitations. First, this data set was provided by an advertisement intermediary; only the
Internet browsing data at the network of this intermediary was visible to us. Although we conducted the robustness
test to prove this is not a concern, we believe that if future research can extend this study with a complete browsing
clickstream data for each user, the results can be more convincing. Second, our data set involves more than 200 online
advertisements, and the intermediary did not provide the information about these advertisements to us; therefore, we
cannot control the effects of advertisements on users’ click behavior. Future research with information about the
advertisements may address this issue and enable an increasingly robust analysis. Third, the performance of this
algorithm partially relies on the cluster results of content groups. Due to the diversity of the contents on the websites
used in this study, we ended up with 39 content groups, we only retained the top six content groups to have a reasonable
moderating effect analysis. If online advertisers had predefined smaller numbers of content groups, the analysis may
have been more complete.
Acknowledgement
Authors would like to thank the financial support from The National Natural Science Foundation of China (Grant
Number 71872050, 71972043 and 71531006). We would also like to thank Za Tingting for her contribution on this
paper.
REFERENCES
Angwin, J. “The Web's New Gold Mine: Your Secrets,” July 30, 2010,
https://www.wsj.com/articles/SB10001424052748703940904575395073512989404
Collier, Joel E. and Barnes, Donald C. “Self-service delight: Exploring the hedonic aspects of self-service,” Journal
of Business Research, Vol. 68, No.5:986-993, 2015.
Bahari, T. Femina, and M. S. Elayidom, “An Efficient CRM-Data Mining Framework for the Prediction of Customer
Behaviour,” Procedia Computer Science, Vol. 46, No. Complete:725-731, 2015.
Baumgartner, H. and J. B. E. M. Steenkamp, “Exploratory consumer buying behavior: Conceptualization and
measurement,” International Journal of Research in Marketing, Vol. 13, No. 2:0-137, 1996.
Belk, Russell W., “Situation Variables and Consumer Behavior,” Journal of Consumer Research, Vol. 2, No. 3:157-
164, 1975.
Berlyne, D. E., Conflict, Arousal and Curiosity, New York: McGraw Hill, 1960.
Bleier, Alexander , and M. Eisenbeiss, “The Importance of Trust for Personalized Online Advertising,” Journal of
Retailing, Vol. 91, No. 3:390-409, 2015.
Cialdini B. Influence: Science and Practice, Allyn and Bacon; 5th edition (August 8, 2008)
Darley, John M., and C. D. Batson, “"From Jerusalem to Jericho": A study of situational and dispositional variables
in helping behavior,” Journal of Personality and Social Psychology, Vol. 27, No. 1:100-108, 1973.
Pentland, Feldman Brian T., “Reconceptualizing Organizational Routines as a Source of Flexibility and Change,”
Administrative Science Quarterly, Vol. 48, No. 1:94-118, 2003.
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 127
Fiedler, F. E., A theory of leadership effectiveness. In L. Berkowitz (Ed.), Advances in experimental social
psychology. New York: Academic Press, 1964.
Freeman J. and M. T. Hannan, “Niche Width and the Dynamics of Organizational Populations,” American Journal of
Sociology, Vol. 88, No. 6:1116-1145, 1983.
Garfinkel, Harold. “Studies of the Routine Grounds of Everyday Activities,” Social Problems, Vol. 11, No. 3: 225-
50, 1964.
Goldfarb, Avi, and Tucker, Catherine E. “Privacy Regulation and Online Advertising,” Management Science, Vol. 57,
No. 1:57-71, 2011.
Hersey, P. and K. H. Blanchard, “An introduction to situational leadership,” Training and Development Journal, Vol.
23:26–34, 1969.
Hoban P. R. and R. E. Bucklin, “Effects of Internet Display Advertising in the Purchase Funnel: Model-Based Insights
from a Randomized Field Experiment,” Journal of Marketing Research, Vol. 52, No. 3:375-393, 2015.
Li H., V. S. Lai, and X. Luo, “Understanding the Role of Social Situations on Continuance Participation Intention in
Online Communities: An Empirical Perspective,” Journal of Electronic Commerce Research, Vol. 17, No. 3,
2016.
IAB, Targeting local markets: an IAB interactive advertising guide. Interactive Advertising Bureau, 2010. Available
at http://www.iab.net/media/file/IAB_Local_Targeting_ Guide_0922_FINAL.pdf. Retrieved on 5 October 2014.
Lambecht, A. and C. Tucker, “When does retargeting work? Information specificity in online advertising,” Journal of
Marketing Research, Vol. 50, No. 5:561-576, 2013.
Lian S. and Y. Xu, “Family Profiles in Retailing: A Novel Algorithm Combining PU Learning and Feature Selection,”
Japan: PACIS, 2010.
Lu X., X. Zhao, and L. Xue, “Is combining contextual and behavioral targeting strategies effective in online
advertising?” ACM Transactions on Management Information Systems, Vol. 7, No. 1:1-20, 2016.
Moriguchi, Tz and G. Xiong, and X. Luo, “Retargeting Ads for Shopping Cart Recovery: Evidence from Online Field
Experiments,” 2016. Available at
SSRN: https://ssrn.com/abstract=2847631 or http://dx.doi.org/10.2139/ssrn.2847631S
Perrow, C., “A framework for the comparative analysis of organizations,” American Sociological Review, Vol. 32,
No. 2:194-208, 1967.
Raju, P. S., “Optimum Stimulation Level: Its Relationship to Personality, Demographics, and Exploratory Behavior,”
Journal of Consumer Research, Vol. 7, No. 3:272-282, 1980
Sacks, H., On doing ‘‘being ordinary.’’ In J. M. Atkinson & J. Heritage (Eds.), Structures of social action: Studies in
conversation analysis (pp. 413–429). Cambridge, UK: Cambridge University Press,1984.
Simon, Francoise , and J. C. Usunier, "Cognitive, demographic, and situational determinants of service customer
preference for personnel-in-contact over self-service technology," International Journal of Research in
Marketing, Vol. 24, No. 2:163-173, 2007.
Sommers S. Situations Matter: Understanding How Context Transforms Your World, Riverhead Books, 2012.
Stanton J.L. and P. G. Bonner,"An Investigation of the Differential Impact of Purchase Situation on Levels of
Consumer Choice Behavior", in NA - Advances in Consumer Research Volume 07, eds. Jerry C. Olson, Ann
Abor, MI : Association for Consumer Research, Pages: 639-643,1980.
Steenkamp, J.B.E.M. and H. Baumgartner, “The Role of Optimum Stimulation Level in Exploratory Consumer
Behavior,” Journal of Consumer Research, Vol. 19, No. 3:434-448, 1992.
Trusov M., L. Ma and Z. Jamal, “Crumbs of the Cookie: User Profiling in Customer-Base Analysis and Behavioral
Targeting,” Marketing Science Special Issue on Big Data, Vol. 35, No. 3:405-426, 2016.
Yan, J., G. Wang, E. Zhang, Y. Jiang and Z. Chen, How much can behavioral targeting help online advertising?, In
Proceedings of WWW 2009 MADRID, available at http://www2009.eprints.org. Retrieved on 5 October 2014.
Zhang, K. and Z. Katona, “Contextual advertising,” Marketing Science, Vol. 31, No. 6:980-994, 2012.
Lu & Chen: Situations Matter
Page 128
Appendix 1: The Cluster Algorithm
We conducted the clustering in two steps. The first step is to calculate the weight of each keyword among all the
web domains. We adopted the Vector Space Model (VSM) to perform this task. For this model, we used 𝐷𝑖 to
describe each web page.
𝐷𝑖 = (𝑇𝑗 , 𝑊𝑖𝑗 , 𝑗 = 1,2,3, … , 𝑀), 𝑖 = 1,2, … , 𝑁 N web pages, M keywords
𝑇𝑗——Keywords 𝑗
𝑊𝑖𝑗——The weight of keyword 𝑇𝑗at web page 𝐷𝑖 .
TF-IDF was used to compute it. IT-IDF is a widely used method to compute the weight matrix. The keyword TF
used here is the frequency of keywords T at domain D, whereas IDF suggests that if the number of web domains that
include the keyword T is small, then the weight of keywords T at web domain D is higher. Therefore, we differentiated
the web domains. The formula to calculate TF and IDF are as follows:
𝑇𝐹𝑑𝑡 =𝑇ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝐾𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑇 𝑖𝑛 𝑤𝑒𝑏 𝑑𝑜𝑚𝑎𝑖𝑛 𝐷
𝑇ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑎𝑙𝑙 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑤𝑒𝑏 𝑑𝑜𝑚𝑎𝑖𝑛 𝐷
𝐼𝐷𝐹𝑡 = 𝑙𝑜𝑔𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑏 𝑑𝑜𝑚𝑎𝑖𝑛 (299)
𝑡ℎ𝑒 𝑤𝑒𝑏 𝑑𝑜𝑚𝑎𝑖𝑛 𝑖𝑛𝑐𝑙𝑢𝑑𝑒𝑠 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 𝑇
𝑊𝑑𝑡 = 𝑇𝐹𝑖𝑗 × 𝐼𝐷𝐹𝑡
We denoted the web domain as D, 𝐷 = 1,2,3, … ,172, keywords as 𝑇, 𝑇 = 1,2,3, … ,352. Therefore, the weight
of keyword T in web domain D is 𝑊𝑑𝑡. After this step, we had a 172 × 352 matrix to indicate the weight of each
domain across 352 types of keywords. After the 172× 352 matrix, we could conduct the cluster to further aggregate
the 172 domains into fewer types based on the similarity of the keywords.
This study used the GMM. We used the package “sklearn” on Python to conduct a detailed clustering process. To
determine the optimal segment number of clusters, we used the “silhouette_coefficient” to pick the number. The
optimal evaluation criteria is the highest of this coefficient, and the lowest of the standard deviation. Table 8 lists the
comparison when the cluster groups equal 37, 39, 41, and 42; we found that when the group numbers = 39, it had the
largest silhouette coefficient and smallest deviation. Therefore, in this study, we used 39 content groups.
Table 8. Model Comparison
Group
numbers
Cluster
Time
Silhouette
coefficient
SD Min Max
37 5 .2552 .0053572 .247 .26
39 5 .2554 .0036469 .249 .268
41 5 .2408 .0075961 .229 .25
42 5 .2466 .0047223 .239 .251
Journal of Electronic Commerce Research, VOL 21, NO 2, 2020
Page 129
Appendix 2: Table 9. Estimation Results of the Six Content Groups
VARIABLES moderating effect of
CR
moderating effect of
TR
moderating effect of
CR and TR
Routineness
variables
CR 0.820*** 1.079*** 0.917*** (0.163) (0.0491) (0.166)
TR -0.578** -0.0102 0.0169 (0.261) (0.610) (0.615)
CR*TR 1.190*** 0.313 0.387 (0.282) (0.294) (0.298)
Non-routineness
variables
webfirst -0.269*** -0.286*** -0.277***
(0.0228) (0.0227) (0.0228)
hourfirst 0.262*** 0.217*** 0.219***
(0.0467) (0.0469) (0.0469)
Control
variables
ImpCidVisit 0.000827*** 0.000811*** 0.000821*** (4.00e-05) (4.00e-05) (4.00e-05)
ClkCidVisit 0.00871*** 0.00871*** 0.00866*** (0.000463) (0.000463) (0.000463)
adCliclprone 12.01*** 11.97*** 11.99*** (0.0670) (0.0668) (0.0670)
Browsesystem dummy -0.0710*** -0.0715*** -0.0724*** (0.0120) (0.0120) (0.0120)
cityid 0.00786*** 0.00796*** 0.00809*** (0.00254) (0.00254) (0.00254)
Group1 1.146*** 1.220*** 1.188*** (0.102) (0.106) (0.129)
Group 2 -0.339** 0.105 -0.529*** (0.135) (0.109) (0.156)
Group 3 -0.116 -0.522*** -0.204 (0.178) (0.194) (0.222)
Group 4 -2.391*** -3.230*** -3.445*** (0.485) (0.674) (0.735)
Group 5 -0.696*** -0.943*** -0.621*** (0.158) (0.162) (0.191)
Group 6 0.0430 0.460** 0.280 (0.268) (0.191) (0.316)
CRXGroup1 0.116
0.122 (0.164)
(0.165)
CRXGroup 2 0.813***
0.767*** (0.186)
(0.187)
CRXGroup 3 -0.690**
-0.705** (0.287)
(0.290)
CRXGroup 4 1.180
0.714 (0.864)
(0.909)
CRXGroup 5 -0.819***
-0.827*** (0.283)
(0.285)
CRXGroup 6 0.459
0.388 (0.497)
(0.494)
TRXGroup1
-0.293 -0.375 (0.600) (0.607)
TRXGroup 2
2.119*** 2.004*** (0.619) (0.628)
TRXGroup 3 0.341 0.627 (1.003) (0.995)
TRXGroup 4 8.086*** 7.830*** (2.943) (3.035)
TRXGroup 5 -1.035 -0.594 (1.051) (1.047)
TRXGroup 6 -1.680 -1.686 (1.245) (1.252)
Constant -5.386*** -5.472*** -5.417***
(0.132) (0.135) (0.151)
Observations 636,720 636,720 636,720
Hour dummy Included Included Included
Likelihood -130758.73 -130792.48 -130657.83