Automatic content-based evaluation of companies' facebook ... · AUTOMATIC CONTENT-BASED EVALUATION...
Transcript of Automatic content-based evaluation of companies' facebook ... · AUTOMATIC CONTENT-BASED EVALUATION...
AUTOMATIC CONTENT-BASED EVALUATION OF COMPANIES' FACEBOOK MESSAGES: APPROACHES AND BASELINE IMPACT Aantal woorden / Word count: 28.259
Henri De Bruyn Stamnummer / student number : 000130683248 Promotor / supervisor: Prof. Dr. Dirk Van den Poel Masterproef voorgedragen tot het bekomen van de graad van: Master’s Dissertation submitted to obtain the degree of: Master in Business Engineering: Data Analytics Academiejaar / Academic year: 2018-2019
AUTOMATIC CONTENT-BASED EVALUATION OF COMPANIES' FACEBOOK MESSAGES: APPROACHES AND BASELINE IMPACT Aantal woorden / Word count: 28.259
Henri De Bruyn Stamnummer / student number : 000130683248 Promotor / supervisor: Prof. Dr. Dirk Van den Poel Masterproef voorgedragen tot het bekomen van de graad van: Master’s Dissertation submitted to obtain the degree of: Master in Business Engineering: Data Analytics Academiejaar / Academic year: 2018-2019
Confidentiality agreement
PERMISSION
I declare that the content of this Master’s Dissertation may be consulted and/or reproduced, provided
that the source is referenced.
Name student: Henri De Bruyn
Signature
I
Foreword This master dissertation is the closing piece of my education in Business Engineering, master in Data
Analytics. I would like to take the opportunity to express my very great appreciation to some people
who supported me and made it possible to write this thesis. First of all, I would like to express my deep
gratitude to Professor Van den Poel for being my supervisor and teaching me interesting insights of data
analytics over the last years. Secondly, I would like to offer my special thanks to Assistant Professor
Meire. My research would have been impossible without the support from him. He guided me through
my dissertation by giving valuable critiques over time. In addition, dr. Meire was always open to
schedule a meeting and kept me on the right track where needed. I also would like to thank my
grandfather, who once in a while made special time to proofread my thesis, focusing on grammar. Next,
I would also like to thank my parents who supported me throughout my life, specially over the last years
during my studies. I am profoundly grateful to my father who assisted me during this thesis with fall
backs, and always helped me with whatever I was struggling. Heartfelt gratitude goes to my mum for
always providing me delicious food and lovely talks whenever I needed it the most during the writing
of my thesis. Finally, I would to extend my thanks to my friends who always tried to help me with any
question or problem I was facing.
(L. de Vries, Gensler, & Leeflang, 2012)(Netzer, Feldman, Goldenberg, & Fresko, 2012)(Davis, Piven, & Breazeale, 2014) (Lee, Hosanagar, & Nair, 2018)(Stephen, Sciandra, & Inman, 2015)(Meire, Ballings, & Van den Poel, 2016) (Murzintcev, 2019) (Breiman, 2001) (Zhao, Jiang, Weng, He, & Lim, 2011) (Kim, Spiller, & Hettche, 2015)
II
Contents
Foreword ............................................................................................................................................ I
List of Abbreviations......................................................................................................................... V
List of Tables ................................................................................................................................... VI
List of Figures ................................................................................................................................. VII
1 Introduction ................................................................................................................................ 1
1.1 Problem definition .............................................................................................................. 2
1.2 Objectives and research question ......................................................................................... 4
2 The rise of Social Media ............................................................................................................. 5
2.1 Social Media ....................................................................................................................... 5
2.2 Social Media Strategy ......................................................................................................... 7
2.3 Brand fan pages and posts ................................................................................................... 9
3 Classification review ................................................................................................................ 10
3.1 General overview .............................................................................................................. 11
3.1.1 (Base) Content Approach .......................................................................................... 12
3.1.2 Message Strategy Approach ...................................................................................... 13
3.1.3 Marketeer’s Orientation Approach ............................................................................ 13
3.1.4 Viral Marketing Rules Approach ............................................................................... 13
3.1.5 Unsupervised Approach ............................................................................................ 14
3.1.6 Media Type Approach ............................................................................................... 14
4 (Base) Content Approach.......................................................................................................... 15
4.1 Literature review............................................................................................................... 15
4.2 Classification variables ..................................................................................................... 17
4.2.1 Entertainment ............................................................................................................ 18
4.2.2 Information ............................................................................................................... 18
4.2.3 Transaction ............................................................................................................... 19
5 Message Strategy Approach ..................................................................................................... 19
5.1.1 Literature review ....................................................................................................... 19
5.2 Classification variables ..................................................................................................... 21
5.2.1 Functional ................................................................................................................. 21
5.2.2 Experiential ............................................................................................................... 21
5.2.3 Emotional ................................................................................................................. 22
5.2.4 Brand Resonance ...................................................................................................... 22
6 Marketeer’s Orientation Approach ............................................................................................ 23
6.1 Literature review............................................................................................................... 23
III
6.2 Classification variables ..................................................................................................... 24
6.2.1 Task-oriented ............................................................................................................ 24
6.2.2 Relationship/Interaction-oriented ............................................................................... 25
6.2.3 Self-oriented ............................................................................................................. 26
7 Viral Marketing Rules Approach .............................................................................................. 27
7.1 Literature review............................................................................................................... 27
7.2 Classification variables ..................................................................................................... 27
7.2.1 Event ........................................................................................................................ 27
7.2.2 Product ..................................................................................................................... 28
7.2.3 Promotion ................................................................................................................. 28
7.2.4 Entertainment ............................................................................................................ 28
8 Unsupervised Approach............................................................................................................ 29
9 Media Type Approach .............................................................................................................. 30
9.1 Interactivity ...................................................................................................................... 30
9.2 Vividness .......................................................................................................................... 31
10 Valence ................................................................................................................................ 32
11 B2B vs B2C ......................................................................................................................... 33
12 Human classification coding ................................................................................................. 34
12.1 (Base) Content Approach .................................................................................................. 34
12.2 Message Strategy Approach .............................................................................................. 35
12.3 Marketeer’s Orientation Approach .................................................................................... 36
12.4 Takeaways ........................................................................................................................ 36
13 Methodology ........................................................................................................................ 37
13.1 Data .................................................................................................................................. 37
13.2 Model description ............................................................................................................. 37
13.3 Data preparation ............................................................................................................... 38
13.4 Unsupervised algorithm .................................................................................................... 39
13.5 Supervised algorithm ........................................................................................................ 39
13.5.1 Singular Value Decomposition .................................................................................. 40
13.5.2 Random Forest .......................................................................................................... 40
13.5.3 Performance .............................................................................................................. 41
13.6 Basic impact analysis ........................................................................................................ 42
13.6.1 Independent variables ................................................................................................ 44
13.6.2 Dependent variables .................................................................................................. 45
14 Results ................................................................................................................................. 46
14.1 Descriptive statistics ......................................................................................................... 46
14.2 Topic model...................................................................................................................... 47
IV
14.3 Random Forest.................................................................................................................. 49
14.3.1 Information model..................................................................................................... 49
14.3.2 Evaluation supervised approaches ............................................................................. 50
14.4 Basic impact approach ...................................................................................................... 52
14.4.1 Model evaluation ...................................................................................................... 52
14.4.2 Estimation results ...................................................................................................... 53
14.4.3 Number of reactions .................................................................................................. 56
14.4.4 Number of shares ...................................................................................................... 59
14.4.5 Number of comments ................................................................................................ 60
15 Discussion and managerial implications ................................................................................ 63
15.1 Selection of Classification approach .................................................................................. 63
15.2 Enhancing Brand post popularity ...................................................................................... 64
15.2.1 Enhancing the number of reactions ............................................................................ 64
15.2.2 Enhancing the number of shares ................................................................................ 65
15.2.3 Enhancing the number of comments .......................................................................... 65
16 Summary .............................................................................................................................. 67
References ..................................................................................................................................... VIII
Appendix ........................................................................................................................................ XII
V
List of Abbreviations AIC Akaike Information Criterion
AMT Amazon Mechanical Turk
AUC Area Under the Receiver Operating Characteristic Curve
B2B Business to Business
B2C Business to Consumers
BCA Base Content Approach
BP Brand post
BPP Brand post popularity
CA Content Approach
CE Customer's engagement
DTM Document term matrix
DV Dependent variable
e.g. exempli gratia
etc. et cetera
IM Isolated model
IV Independent variable
LDA Latent Dirichlet allocation
MAM Multi Approaches model
MDA Mean Decrease in Accuracy
MDG Mean Decrease in Gini
MGC Marketeer-generated content
MOA Marketeer's Orientation Approach
MSA Message Stragegy Approach
MTA Media Type Approach
NBR Negative Binomial Regression
OLSR Ordinary Least Squares Regression
OOB Out-of-Bag
RF Random Forest
SL Supervised learning
SM Social Media
SMA Social Media analysis
SMM Social Media marketing
SMS Social Media strategy
UGC User-generated content
UGT Uses and Gratifications theory
UL Unsupervised learning
VMRA Viral Marketing Rules Approach
WOM Worth of mouth
VI
List of Tables TABLE 1: SOCIAL MEDIA CHANNEL USAGE (ASHLEY & TUTEN, 2015) .................................................................. 7
TABLE 2: CLASSIFICATION APPROACHES .......................................................................................................... 11
TABLE 3: LITERATURE REVIEW: (BASE) CONTENT APPROACH ........................................................................... 15
TABLE 4: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: (BASE) CONTENT APPROACH ..... 17
TABLE 5: LITERATURE REVIEW: MESSAGE STRATEGY APPROACH ..................................................................... 19
TABLE 6: MESSAGE STRATEGY USAGE (ASHLEY & TUTEN, 2015) ...................................................................... 20
TABLE 7: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: MESSAGE STRATEGY APPROACH 21
TABLE 8: LITERATURE REVIEW: MARKETEER'S ORIENTATION APPROACH ......................................................... 23
TABLE 9: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: MARKETEER'S ORIENTATION
APPROACH ............................................................................................................................................ 24
TABLE 10: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: VIRAL MARKETING RULES
APPROACH ............................................................................................................................................ 27
TABLE 11: LITERATURE REVIEW: MEDIA TYPE APPROACH ................................................................................ 30
TABLE 12: OVERVIEW HUMAN CODING CLASSIFICATION ................................................................................. 35
TABLE 13: DIFFERENT MODELS OF IMPACT APPROACHES ................................................................................ 43
TABLE 14: INDEPENDENT VARIABLES............................................................................................................... 44
TABLE 15: AUC'S OF THE DIFFERENT CLASSIFICATION VARIABLES .................................................................... 50
TABLE 16: EVALUATION OF THE DIFFERENT APPROACHES ............................................................................... 52
TABLE 17: ESTIMATION RESULTS FOR BRAND POST POPULARITY, MULTI APPROACHES MODEL ....................... 53
TABLE 18: ESTIMATION RESULTS FOR BRAND POST POPULARITY, TOPIC APPROACH ....................................... 54
TABLE 19: ESTIMATION RESULTS FOR BRAND POST POPULARITY, BASE CONTENT APPROACH ......................... 54
TABLE 20: ESTIMATION RESULTS FOR BRAND POST POPULARITY, CONTENT APPROACH .................................. 55
TABLE 21: ESTIMATION RESULTS FOR BRAND POST POPULARITY, MESSAGE STRATEGY APPROACH ................. 55
TABLE 22: ESTIMATION RESULTS FOR BRAND POST POPULARITY, MARKETEER'S ORIENTATION APPROACH ..... 56
VII
List of Figures FIGURE 1: ‘NUMBER OF SOCIAL MEDIA USERS WORLDWIDE FROM 2010 TO 2021 (IN BILLIONS)’, 2019 ............. 6
FIGURE 2: ‘MOST POPULAR SOCIAL NETWORKS WORLDWIDE AS OF APRIL 2019, RANKED BY NUMBER OF
ACTIVE USERS (IN MILLIONS)’, 2019 ......................................................................................................... 6
FIGURE 3: CONCEPTUAL FRAMEWORK (DE VRIES ET AL., 2012) ....................................................................... 16
FIGURE 4: CONCEPTUAL FRAMEWORK (CVIJIKJ & MICHAHELLES, 2013) ........................................................... 16
FIGURE 5: CONCEPTUAL FRAMEWORK (SABATE ET AL., 2014) ......................................................................... 31
FIGURE 6: CONCEPTUAL FRAMEWORK (SWANI ET AL., 2017) .......................................................................... 33
FIGURE 7: CONCEPTUAL FRAMEWORK ............................................................................................................ 38
FIGURE 8: REACTIONS POSSIBILITIES ON A FACEBOOK POST (KRUG, 2016) ...................................................... 45
FIGURE 9: OPTIMAL NUMBER OF TOPICS ........................................................................................................ 47
FIGURE 10: TOP 10 TERMS OF EACH TOPIC ..................................................................................................... 48
FIGURE 11: AUC-CURVES OF THE DIFFERENT CLASSIFICATION APPROACHES ................................................... 51
1
1 Introduction Social Media (SM) platforms have experienced exponential growth over the last years. Networking sites
like Facebook, Twitter & LinkedIn have shown an exceptional increase. Users have shifted from
traditional communication channels like mail and post to micro-blogging and SM platforms (Setty, Jadi,
Shaikh, Mattikalli, & Mudenagudi, 2014). In the past, companies have tried to build up their relationship
with their customers through traditional marketing activities like public relations and direct marketing.
Nowadays the passive customer-company relationship has shifted towards an active relationship where
customers are becoming co-creators online which create multiple opportunities for an increase in word
of mouth (WOM) and engagement towards the brand (Jahn & Kunz, 2012). The increase of SM
platforms has resulted in an increasing availability of data. Every day, SM databases are becoming richer
and richer which has made “Big Data” a hot topic. Facebook is one of the main SM platforms that makes
efficient use of their data. They even have made abuse of your data. In 2018, 2.7 million Europeans
where affected by Facebook’s privacy scandal (‘2,7 miljoen Europeanen zijn getroffen door
privacyschandaal Facebook’, 2016). Facebook has been an important factor in the discussion of users’
privacy. Besides this negative publicity for Facebook, thousands of companies make efficient use of
their data on Facebook to elaborate their marketing strategy. By generating specific content online
through marketeer-generated content (MGC) they can achieve their marketing goals.
Brands must focus on specific content to increase customers’ motivation to participate online and
become loyal to the brand (Sabate, Berbegal-Mirabent, Cañabate, & Lebherz, 2014). Successful content
will be perceived positively by consumers who will add value to the content by liking, commenting,
sharing, clicking, etc. which has resulted in an enormous increase in user-generated content (UGC) or
WOM (Goh, Heng, & Lin, 2013). Marketeers want to have more structure and insight into their data to
know which content is valuable (Setty et al., 2014). Over the last years, efforts have been made to
automatically classify posts into a specific content category. A first effort dates to the late 1600 when
the church tried to track documents which were not of religious content (Hopkins & King, 2010). Similar
techniques were used during mid 1900, where for the first time the word “content analysis” was used.
Recently, the increase of digitalized text on SM, web pages, blogs, online texts etc. has made automatic
content-based evaluation even more important.
Consumers who interact every day on a brand fan page become advocates of that brand. These people
are important for the brand since they can influence other consumer’s opinion or purchase behaviour. In
other words, marketeers, who have a well-applied Social Media strategy (SMS) by posting the right
content, are a step ahead to increase customer’s engagement (CE) or even to boost sales performance of
their customers (Swani, Brown, & Milne, 2014). Besides, Facebook is the new number one way for
companies “to get the word out and bring people in” (Shen & Bissell, 2013).
2
1.1 Problem definition The increase of SM has resulted in a new area of research. Content research of SM posts/tweets/blogs
has become a hot topic over the last years. A lot of research has focused on what content variables drive
consumers’ engagement in the form of brand post popularity (BPP) (number of likes, number of
comments, number of shares, number of click-throughs etc.). Besides the increasing popularity of this
topic, we have identified 3 problems based on previous literature.
Problem 1: What content should marketeers post online (MGC) to increase online customer’s
engagement? This has become an important question.
De Vries, Gensler, & Leeflang (2012) investigated the impact of the valence of comments on BPP.
Netzer, Feldman, Goldenberg & Fresko (2012) conducted an unsupervised analysis on user-generated
content to get market structures & insights out of it. While these two studies placed the focus more on
UGC, there has been a shift by giving more attention to MGC. Kim, Spiller & Hettche (2015) classified
brand posts (BP’s) into 3 categories which focus on the marketeer’s perception (task-oriented,
relationship-oriented & self-oriented content). Cvijikj & Michahelles (2013) studied which content
types (entertainment, information or remuneration) have an impact on BPP. Shen & Bissell (2013)
shifted the focus from content delivery to content exchange classifying Facebook posts into event,
product, promotion & entertainment (MGC). Different outcomes have been found between UGC and
MGC. While for UGC both information richness and valence are influencing purchase behaviour,
valence only plays an important role for the impact on sales for marketeers (Goh et al., 2013). In other
words, marketeers should play a persuasive role in SM context using positive words and phrases in their
posts. Over the last years, more studies have already shifted the focus from UGC to MGC by taking the
marketeer’s viewpoint into account. Besides, it is stated as a problem, we would rather call it a remark
to take into account.
Problem 2: Previous research has focused on one specific content classification framework.
A lot of studies over the last years have analysed the impact of different content variables on BPP or
audience response. Most of these studies have focused on one specific content classification framework
and less is known about the comparison of these different classification approaches. A classification
approach or framework refers to a possible way of classifying Facebook posts into different categories.
There has been a trade off in previous research between focusing on one classification framework,
accomplished with a predictive approach (e.g. analysing the impact on BPP) (Cvijikj & Michahelles,
2013; de Vries et al., 2012; Sabate et al., 2014) and studies who have focused on the comparison of
different classification methods to come up with a general model without analysing the impact of the
model on BPP (Tafesse & Wien, 2017). De Vries et al. (2012) classified BP’s into vividness,
interactivity, informational & entertaining content posts. The framework from Cvijikj & Michahelles
(2013) paid already more attention to the content of posts but only made use of one specific classification
3
framework. Some studies made already use of a predictive approach on the number of likes & comments
(Swani et al., 2014; Swani, Milne, Brown, Assaf, & Donthu, 2017) while other studies were of
qualitative origin (Davis et al., 2014; Jahn & Kunz, 2012; Tafesse & Wien, 2017). Although, these
studies have made use of one well-established framework, no study has compared the effectiveness of
different content frameworks.
Problem 3: The classification frameworks of previous research are at randomly chosen (not based
on previous literature).
The categorization frameworks of social BP’s of previous research are mostly based on subjective
choice. The study from de Vries et al. (2012) has analysed the impact of different post types on customer
engagement. This research was based on a conceptual framework for the determinants of BPP (number
of likes and number of comments). Despite this framework is well applicable to BP’s, it has an important
limitation. The authors came up with this framework based on own preferences without looking at how
frameworks have been used in previous literature. The framework from Cvijikj & Michahelles (2013)
is already better aligned with previous research. It made use of the Uses and Gratifications theory (UGT)
and looked at what drives consumers for online engagement with a preferred brand. Still, this framework
only focused on the UGT and did not look at other possible approaches for the categorization of BP’s.
To our knowledge the study from Tafesse & Wien (2017) is the first study who came up with a
formalized analysis of BP’s. The different classification frameworks were based on an extensive review
of previous SM literature. Still, this study focused more on building 1 overall classification framework
(consisted out of 12 exhaustive and mutually exclusive categories). The study did not include an analysis
of the different content classification approaches which have been used in previous research or even
conducted a baseline impact on BPP. Overall, previous research frameworks have been too much ad hoc
to come up with well-established different classification approaches.
Given the higher interest of fostering online engagement between companies and their customers,
companies want to know which different content classification approaches are available and to which
extend each approach contributes to the online relationship with the consumers. Thus, this thesis has the
following purpose:
To Investigate which different classification approaches have been used in automatic content
classification of Facebook messages in order to find out which approach is best suitable for content
classification and to which extend each category has an impact on BPP (likes, shares & comments)?
4
1.2 Objectives and research question The exponential increase of SM and the associated data has resulted in more research of what content
influence consumers top participate online. Since this topic is quite new in research, this branch still has
a lot of opportunities for further research. We found 3 limitations in previous research, which we will
mainly tackle in our study. Remark 1 stated that managers and marketeers want to get more valuable
insights on what content to post to increase customer engagement. The first studies in Social Media
analysis (SMA) were most of UGC (L. de Vries et al., 2012), while MGC is getting more attention over
the last few years (Kim et al., 2015; Swani et al., 2017). Since managers want to get more useful insights
in what content to post online on their Brand page, we will take the same viewpoint into account and
look at what drives marketeers to post specific content online (MGC). Problem 2 rose from the
limitation that previous research has focused on one specific content classification framework. To our
knowledge, no study has ever compared the effectiveness of different content classification approaches.
This is the first study that will compare different content coding methods together with their relationship
with BPP. Problem 3 stated that previous categorization frameworks are not based on the extensive
review of previous SM literature. Prior research on SM content made use of limited characteristics.
There is need for a more comprehensive approach of content categorization of SM posts. The
classification approaches of our thesis will be based on previous literature. Our thesis will take these 3
limitations into account.
The research question of this master dissertation will intend to answer the following questions:
RQ1: Which approaches have been currently used for automatic content classification of companies’
Facebook messages, how does each categorization approach looks like, and is there a best suitable
model for automatic content classification?
RQ2: What is the impact per category of each classification approach on brand post popularity
(number of likes, shares & comments)?
This research is structured as follows. First, we present the main concepts of SM, SMS and brand fan
pages & posts. Secondly, we will elaborate on previous literature, focusing on which different
approaches have been used in SM content classification. In the next sections, we will also briefly look
at how media types and valence have been used in preceding research, followed by the main differences
of content classification between Business to Business (B2B) and Business to Consumers (B2C)
environment. Subsequently, a short section is dedicated to the human classification of Facebook
messages. Next, we will present the methodology of our research, followed by the findings of our
analysis. Finally, we present the conclusions of our results, to end with limitations of our research and
future research possibilities.
5
2 The rise of Social Media Facebook is at this moment the largest growing social network. In the third quarter of 2018 it had 2.27
billion active users on a monthly base and it is still increasing every quarter. In 2012 Facebook exceeded
the monthly active users of 1 billion, which made it the first social network who ever surpassed this limit
(‘Number of monthly active Facebook users worldwide as of 1st quarter 2019 (in millions)’, 2019).
Facebook has become one of the main WOM communications for brands. Marketeers perceive it as the
most attractive SM network in a B2C environment (Cvijikj & Michahelles, 2013). Over the last year,
the importance of SM has increased for brands to communicate with their consumers. Nowadays,
companies are using SM for increasing their customer relationship, services, sales promotions, branding
and research (Ashley & Tuten, 2015). Through this paper we will make use of the UGT which tries to
understand the goals and motivations of individuals for social engagement for different type of posts
(Cvijikj & Michahelles, 2013). It explains that consumer needs for communications are aligned towards
content, relationships and themselves while focusing on mass media communication (Ashley & Tuten,
2015). The UGT explains why people use different type of SM (de Vries & Carlson, 2014; Jahn & Kunz,
2012). Specific to our paper, we will focus which characteristics of MGC drive people to social
engagement by looking at which different content classification approaches have been used in SM
research. By making use of the UGT we will better understand which content marketeers should post to
increase their CE. Before analysing the different classification approaches, we will go deeper into some
of the main concepts used in SM content analysis.
2.1 Social Media SM is a virtual place on the internet, which allows bringing people together from different cultural and
geographical backgrounds on a large scale, where people can express themselves online by interacting
and sharing their opinions (Tafesse, 2015). It allows people to create and exchange user-generated
content (Jahn & Kunz, 2012; Shen & Bissell, 2013). SM are presented in many different forms on the
web including blogs, forums, photo-sharing platforms, social gaming, micro blogs, chat apps, and most
important social networks. In 2018, 2.62 billion people have used social network sites, which is more
than one out of 4 people in the world. It is predicted that by 2021, the cap of 3 billion active users on
SM will be transcended (‘Number of social media users worldwide from 2010 to 2021 (in billions)’,
2019) (Figure 1).
6
Figure 1: ‘Number of Social Media users worldwide from 2010 to 2021 (in billions)’, 2019
Eastern Asia and Northern America are the global region where SM is most popular with a penetration
rate of 70%, followed by Northern Europe (‘Global social network penetration rate as of January 2019,
by region’, 2019). Facebook is the leading network site based on active users, followed by YouTube
and WhatsApp, which can be seen on Figure 2 (‘Most popular social networks worldwide as of April
2019, ranked by number of active users (in millions)’, 2019). A trend that we can see in the last years,
is the switch from advertising on PC to mobile advertising since mobile devices are taking the global
lead in SM use (‘Market-Revenue Per Internet User’, 2019). Mobile-first platforms have become more
popular such as Instagram or Twitter. Marketeers can advertise their BP’s in order to increase the reach
of their message on the market. America is the country where most add spending is generated, followed
by China. SM have become the number one place were Marketeers can implement their strategy.
Figure 2: ‘Most popular social networks worldwide as of April 2019, ranked by number of active users (in millions)’, 2019
7
On average, people spend around 20 to 25% of their total time on the Internet and on SM network sites
(Tomaras & Ntalianis, 2015). Global Internet users spend around 135 minutes per day surfing on SM
(‘Social Media Statistics & Facts’, 2019). This exponential increase of SM has shifted the way of how
marketeers should co-operate with their consumers. Each SM site has its own characteristics in terms of
culture and purpose that can be used to execute a specific SMS. This can have a significant impact on
the business practices (Kim et al., 2015; Swani et al., 2014). New opportunities have been raised due to
the SM explosion: companies can increase their public awareness about the brand or even better, align
their product development through closer community involvement. Companies start online competitions
where they let their consumers cooperate in developing a new product and for which the winning team
wins a job offer (Cvijikj & Michahelles, 2013). People are using SM to get specific two-way interactions
with their brand. They make use of SM sites when traditional communication channels are unavailable,
time-consuming or expensive (Davis et al., 2014).
Table 1: Social Media Channel Usage (Ashley & Tuten, 2015)
The study from Ashley & Tuten (2015) analysed which SM channels are being used by companies. An
overview of the top SM channels can be seen in Table 1. Micro blog (e.g. Twitter), social networking
(e.g., Facebook) and microsites (sites at a separate web address to forward to a friend) were the most
commonly used channels. This leads to the conclusion that the marketing communication is happening
where the customer is active most nowadays. The data of our research is coming from Facebook.
2.2 Social Media Strategy Social Media marketing (SMM) or SMS is “the usage of the existing SM platforms for increasing the
brand awareness among consumers on online platforms through utilization of the WOM principles”
(Cvijikj & Michahelles, 2013, p. 845). Another definition is “the utilization of SM technologies,
channels, and software to create, communicate, deliver and exchange offerings that have value for an
organization’s stakeholders” (Tafesse & Wien, 2017, p. 4). The efficient use of a well-defined SMS
gives the company a lot of opportunities such as increasing public awareness or making efficient use of
the data provided by SM. B2B and B2C marketeers are using different SMS’s to increase CE with their
8
target group (Swani et al., 2014, 2017). A main component of implementing a SMS are brand pages
(BP), which are online social networking platforms to connect with their customers and fans. One of the
main reasons why BP are so important for the marketing strategy is that they allow to build an online
community and interact with them (Tafesse & Wien, 2017). Brand communities are communities
recognized by shared values, rituals, myths, hierarchy, vocabulary and traditions, but also by a sense of
moral responsibility. It is a driver for brand commitment, which boosts the relationship between the
brand and the consumers (Gummerus, Liljander, Weman, & Pihlström, 2012; Jahn & Kunz, 2012).
Active participating in an online brand community removes the physical as well as the temporal barriers,
which increase the likelihood of consumers to participate in the online community (Davis et al., 2014).
Secondly, it also improves the WOM communication, which is a powerful tool for marketing since
WOM has made an exponential increase in volume on SM platforms (Cvijikj & Michahelles, 2013).
More specific, we are talking about Electronic WOM which is coming from the online communication
between the consumers who are interacting on SM posts coming from brands. (Kremers, n.d.; Tafesse,
2015). Furthermore, people want to become a member of an online community to increase satisfaction
within the community and to increase their personal degree of influence to other people of the
community (Jahn & Kunz, 2012). Marketeers are capitalizing the trends of brand communities on SM
by increasing CE and generating WOM which result in richer information sharing and understanding
better the drivers of sales (Goh et al., 2013).
A study from Davis et al. (2014) identified five core elements that drive brand consumption in a SM
community, which can be used as opportunities to increase the activity of a beneficial online community.
(1) Functional brand consumption (e.g. problem solving, information searching, evaluate services etc.),
(2) Emotional brand consumption (e.g. enjoyable interactions, feeling privileged or recognized,
satisfaction etc.), (3) Self-oriented brand consumption (e.g. self-actualization, perception and branding),
(4) Social brand consumption (e.g. social interaction, community attachment, experience exchange etc.)
and (5) Relational brand consumption (e.g. desire to know the people behind the brand and to get
personalized interaction and co-creation of the service offered). Each of these 5 drivers is an interesting
opportunity to enhance and increase the relationship with their consumers. Brands should pay attention
to these drivers to get value out of the interactivity with the customers.
Over the last decades, different studies have tried to classify BP’s to make efficient use of a company
message strategy. A message strategy is primary tactic to deliver the key message. It aligns the content
of the brand with the needs of the consumers. Furthermore, it tries to bridge the gap between what
consumers need to hear and what marketeers want to say (Tafesse & Wien, 2017). Brands even use SM
to get more interaction with their customers by introducing new products online, sharing brand related
information or even announcing free giveaways (Newman, 2012). Companies want to increase their
consumer engagement through a well-implemented SMS.
9
Engagement means interacting and cooperating with community members (Cvijikj & Michahelles,
2013). Another definition describes engagement “as a consumer relationship that recognizes that people
are inherently social and look to create and maintain relations not only with other people, but also with
brands.” (Ashley & Tuten, 2015, p. 17). More specific, we will focus on CE which “entails the
customer’s interactive experience with the brand, is context-dependent and enhances consumers”
(Gummerus et al., 2012, p. 859). Marketeers who adapt the engagement perspective are shifting the
focus from a transactional relationship to an interactional perspective. Applied on SM, it means clicking,
sharing or committing on a BP. Companies who understand well the characteristics who influence the
level of CE and apply them well to increase their volume of WOM, are a step ahead of brand awareness.
And this may result in higher revenue. Secondly, CE is in a positive relationship with loyalty and
satisfaction. People who are satisfied about the products of the brand they prefer, are more likely to join
a brand community (Gummerus et al., 2012)
A study from Lee, Hosanagar, & Nair (2018) went even deeper into it, shifting the focus from SMM to
content marketing whith specific focus on developing content that increases engagement. Content has
become more important in SMS because Facebook posts need to be short and to the point and user
engagement is measured daily. Besides, every day, SM data from companies are becoming larger which
makes it even more important. Because of the exponential increase of SM sites, a new marketing
approach was even born, “viral marketing”, which is the spread of the original message of the brand
through consumer interaction. Nowadays, companies make efficient use of this technique to increase
brand image promotion by focusing on those characteristics that have a higher degree of spread (Shen
& Bissell, 2013). Even in the tourism industry, SM plays an important role. Strategies that are aligned
with SM help destinations to remain competitive (Kiráľová & Pavlíčeka, 2015). SM can increase brand
awareness, brand engagement and a WOM by implementing a well-developed communication strategy.
It even enables visitors to communicate with each other by sharing their opinion about recent experience.
Publishing posts on SM can be used for advertising their brand or specific product or services.
2.3 Brand fan pages and posts Brands connect with their customers and fans by sending messages on a regular basis to the world. These
BP’s appear in consumers newsfeeds whether they like the specific brand fan page or not. Companies
can also make use of sponsored posts to increase the reach of the message (Tafesse, 2015). BP’s are a
rich form of communication and serve as different goals depending on the meaning of the message.
They enrich the relationship between the brand and the customer and provide information to the
followers (de Vries et al., 2012). BP’s have also the ability to support multiple media types (e.g. photos,
text, links, videos, quizzes etc.). They strengthen the brand relationship with its customers. Fans who
follow a brand page (like page) do not only comment/like brand’s regular post, they also interact with
other consumers by liking and reacting on other comments. Automatic response options (e.g. likes &
10
shares) allow consumers to instant interactive response without needing to put a lot of effort in it (Tafesse
& Wien, 2017). A brand fan page is in the first place one of the main connections between the consumers
(followers) and the brand (Jahn & Kunz, 2012). It empowers consumers to leave their opinions and
express their feelings which all contribute to the overall richness of the brand (Tafesse & Wien, 2017).
Secondly, it helps companies to communicate on a global level and to do marketing at a personal level
(Cvijikj & Michahelles, 2013). Thirdly, BP’s are a goldmine of information which can deliver social
benefits for its followers (de Vries et al., 2012). It is a useful tool to deepen the relationship between the
brand and a consumer (Jahn & Kunz, 2012). Besides liking, sharing or commenting on a BP, followers
can also send private messages to brand pages for specific personalized questions which support even
more the customer-brand relationship (Tafesse & Wien, 2017). BP’s can have one or more moderators,
who are the owners of like pages and control the page. BP’s can have any number of members, who are
also known as followers. In fact, fans can engage with a company brand page by (1) liking existing posts
by the company posted, (2) sharing posts on their own wall page, (3) posting content on the company’s
wall, and (4) leaving a comment on a Facebook post (Cvijikj & Michahelles, 2013). All these actions
contribute to the implementation of a good SMS through WOM communications.
3 Classification review Previous literature has studied different models of content classification frameworks together with their
relationship between content and BPP (Cvijikj & Michahelles, 2013; de Vries et al., 2012; Swani et al.,
2017), audience response (Tafesse, 2015), CE (Lee et al., 2018) or brand loyalty (Shen & Bissell, 2013).
Besides these studies use different names as a dependent variable (DV), most of them use the same
variables as a measure of their DV (number of likes, comments & shares). Other studies have focused
more on the qualitative sight of research through a survey (Gummerus et al., 2012; Jahn & Kunz, 2012).
As mentioned before, previous classification frameworks are most of the time at random chosen whether
the study is based on predictive or prescriptive research. This section gives you an overview of which
classification approaches have been used in previous research and where the categorization is coming
from, meaning which viewpoint and concepts it takes into account.
First, this section gives you a general overview of previous literature which focused on content
classification of (Social Media) messages. Second, we will give you a first briefly overview of the
different (content) classification approaches that have been found in literature. In the following sections,
a deeper analysis of the different approaches will be given, followed by also taking a first look on how
media elements and valance have been used in previous content literature.
11
3.1 General overview Exhibit 1 provides a representative overview of literature which has focused on the classification of
different SM posts. For each literature, we checked whether the following characteristics are involved
in the specific research. (1) Media elements: does the framework take photos, links, videos etc. into
account? (2) Main focus: is the main focus of the study on the content itself (ex. Angry, valence, sad
etc.) or is the focus on the types of content? (ex. Entertainment, information, transaction etc.). (3)
Learning approach: is the classification of the content messages based on a supervised (SL) or on an
unsupervised learning (UL) approach? (4) Dependent variable: does the DV focus on engagement (e.g.
number of likes, reactions etc.) or on sales (e.g. repeating purchase behaviour). For each study the
research method (predictive / descriptive/ exploratory...), industry and data source are also given.
Followed by how the classification framework looks like, to end with some extra information of the
framework.
Facebook has been the most used data source for content analysis. However, Twitter has become more
popular in later SM content studies. There has been a main focus on content classification through SL
as well as on engagement as a DV. Some papers have focused more on the possible types of contents
while other papers really have focused on what sentiment we can find in the content (content itself).
Most of the first researches, which analysed the relationship between Facebook posts and BPP, also took
media elements into account. Over the last years there has been a shift to studies who are focussing more
on the content of posts. However, there is still a large potential for further research on content analysis.
First of all, only a few studies have conducted an UL approach (Netzer et al., 2012; Zhang, Moe, &
Schweidel, 2017). Secondly, online engagement has been studied widely over different types of SM, but
there is a limitation of knowledge of studies that have analysed the relationship between content and the
sales of company (Goh et al., 2013; Rishika, Kumar, Janakiraman, & Bezawada, 2013). Thirdly, to our
knowledge, no study has compared different framework approaches of content classification to check if
there is a “best” model for content classification. This research is the first study that comes with a
classification literature review and comes up with different preferable distinctive classification
approaches which can be used as a starting point for further content research in SM. Table 2 gives an
overview of the different classification approaches.
Table 2: Classification Approaches
Approach
(0) Base Content Approach (BCA) Information Entertainment/transaction(1) Content Approach (CA) Information Entertainment Transaction(2) Message Stragegy Approach (MSA) Functional Experiential Emotional Brand resonance(3) Marketeer's Orientation Approach (MOA) Task-oriented Relationship/Interaction-oriented Self-oriented(4) Viral Marketing Rules Approach (VMRA) Promotion Product Entertainment EventUnsupervised Approach (UA) *Media Type Approach (MTA) Interactivity Vividness
(0,1,2,3 & 4) have a pre-classified framework (Supervised Approach)* Unsupervised Approach has no pre-classification framework (e.g. Factor Analysis or Topic Analysis)
Classification
12
Different ad hoc approaches can be found based on previous literature. Tafesse & Wien (2017) classified
previous literature into 3 main ad hoc approaches. We have updated & analysed these 3 main approaches
and added a 4th approach which is worth mentioning. All of these 4 classification approaches made use
of SL to classify the different messages in the right category. The qualitative research from Tafesse &
Wien (2017) has only focused on specific possibilities of classification. This study gives us already a
good insight of a “best” model of content classification and different approaches that have been used in
the past, although it did not take the impact of the different variables on CE into account. Still, this
research was the first to our knowledge who compared different classification frameworks who have
been used in previous literature and came up with a categorization of 12 exhaustive and mutually
exclusive categories of BP’s. Firstly, the framework can be used on a daily base for marketeers to inspire
new BP’s. Secondly, it can be used to tune-up a company content strategy. For example, promotional
BP’s can be used to stimulate sales or customer relationship posts can be used to build a brand
community. Although Tafesse & Wien (2017) mentioned already the 3 different approaches in their
research, they did not go deeper into the different approaches and analysed exactly how these
frameworks are derived from previous studies. Our research will take this limitation into account and
will look at the source of the different approaches. The research from Tafesse & Wien (2017) already
gave us a better understanding of different content approaches which have been used. In addition, it
helps us to distinct different content variables and to understand them better, which we will use in our
proposed frameworks.
3.1.1 (Base) Content Approach The first approach tries to differentiate posts based on entertainment, information & transaction (Cvijikj
& Michahelles, 2013; de Vries et al., 2012). However, this type of classification is ineffective against
some type of posts. Where should post about brand resonance (posts about the identity of the brand) or
relationship-oriented posts (e.g. customer feedback, customer testimony, Q&A) be classified? Tafesse
(2015) took this limitation into his research and classified these 3 variables into one group variable
“content-type”. Cvijikj & Michahelles (2013) also took posting time & media type into their model,
which limits its knowledge about the content of posts. Tafesse (2015) also took vividness, interactivity,
novelty, brand consistency into its model besides content type, which made his framework a large scope
compared to previous research. Our study will take this CA into account (information, entertainment
and transaction) and analyse how this framework has been used in previous literature. Secondly, our
study has come up with a BCA framework which will be used as a base categorization which consist
only out of 2 categorizations. This base framework consists out of information and entertainment/
transaction. Besides, previous research from Lee et al. (2018) already came up with a base model and
classified Facebook posts into brand personality-related and directly informative posts, this framework
still lacks in some specific type of posts. Consequently, we can say that this study came up with a new
base classification approach which is best suitable to our knowledge.
13
Although we call this approach the “content” approach, this does not mean that the other approaches did
not focus on the content of the messages. The different between the (Base) Content Approach and the
other classification approaches, is that the BCA and the CA take the content of the message as a starting
viewpoint to classify different messages, while the other approaches take another viewpoint than content
as a starting point (e.g. starting by looking at which message strategies are used, instead of looking at
which different type of content). Besides the different starting points, each message will still be
classified based on the content of the different messages.
3.1.2 Message Strategy Approach The second approach takes some traditional message strategies into account like functional, experiential,
emotional and brand resonance while ignoring several other message strategies (Ashley & Tuten, 2015;
Swani et al., 2014, 2017). It reviews which strategies can be used to bridge the gap between what a
marketeer want to say and what the consumer needs to hear. While these studies give us a good insight
on how different message strategies are applied to brands posts, they lack to consider what is really
stated in the post (content of the posts).
3.1.3 Marketeer’s Orientation Approach The third approach is derived from previous literature which focused on the consumers’ perceived
scheme (e.g. social, functional, and self-concept categories) (Davis et al., 2014; Jahn & Kunz, 2012).
These studies have put the attention on the subjective meaning of the consumers. Compared to these
studies, we will focus our attention on the marketeer’s perceptual feeling with customer’s engagement.
The study from Kim et al. (2015) already took this viewpoint into account and came up with the
following three types of orientation marketeers have when using SM. Through task-oriented
communication they want to achieve a goal (e.g. increase sales of the company). A second oriented
approach is based on the interaction. Marketeers that make use of this orientation want to increase the
relationship with its customer (e.g. increase customer primarily concerned with its own desires and needs
when interaction with others (e.g. increase brand awareness). Self-oriented messages focus on the
personal thoughts and feelings of the brand. Kim et al. (2015) analysed the impact of this approach on
the brands perceived intention to post messages online.
3.1.4 Viral Marketing Rules Approach A fourth approach and a new approach compared to Tafesse & Wien (2017) classifies posts into event,
product, promotion and entertainment (Shen & Bissell, 2013). Compared to other approaches, this
framework focuses on the viral marketing rules (increase awareness of a specific post). Even this
classification framework is a good approach on its own, it has too much overlap with the 3 previous
approaches. So we will not take this model into our methodology of our study which will be further
explained. Still, it is worth mentioning how this approach looks like for companies who want to classify
posts specifically based on the viral marketing rules.
14
3.1.5 Unsupervised Approach Unsupervised classifications have been less used compared to supervised classification approaches. It
tries to get meaningful insights & information out of unstructured data. Zhao, Jiang, Weng, He, & Lim
(2011) came up with the following topics: arts, business, education, style, tech-science and world
specific for data coming from New York Times articles and arts, business, family & life and twitter
specific to data coming from twitter. Netzer et al. (2012) found 3 topics (school, finance and politics)
coming from business school’s data. It is clear that topics and themes are inherent to the characters of
data.
3.1.6 Media Type Approach In the content analysis literature of SM posts, media type elements have been commonly added to the
framework, next to the other supervised classification variables. Previous research has looked at how
media elements (videos, pictures, links, url’s etc.) have an impact on online engagement of brand pages.
Media types can be classified into two categories. Vividness is related to in which extend a specific type
of media stimulates one out of our five senses (Cvijikj & Michahelles, 2013; L. de Vries et al., 2012).
Interactivity focus is on the interaction between two parties, and it looks at how interactive the post is
coming to the brand followers (L. de Vries et al., 2012; Tafesse, 2015)
The exponential increase of SM has resulted in an increasing amount of studies focusing on content
classification of SM posts. This has yield in too many different classification frameworks. Researchers
have been making use of an own preferred classification approach without looking at how it had been
done in previous literature. By giving an overview of the different literatures in Exhibit 1, accomplished
with their own capabilities, we have tried to give a best summarization of previous literature which
focused on message content classification. Different frameworks have emerged over the last years, all
with their own viewpoint and characteristics. Although the classification of posts into a specific
framework remains quite subjective, our research tries to categorize the different approaches used into
one of the specified classification frameworks (BCA, CA, MSS & MOA). We also added an extra
supervised approach (Viral Marketing Rules Approach). As mentioned before, we will not take this
approach into our analysis since it has too much overlap with the other supervised approaches, we will
only look at how this approach is derived compared to previous literature.
In the next sections, we will go deeper into each of the established approaches and look how each
framework is related to previous literature. In addition, a deeper meaning of each classification variable
is given, ensembled with a literature review of how the classification variables of each approach is
related to BPP. Furthermore, we will also look at how unsupervised approaches, media elements and
valance have been used in SM literature.
15
4 (Base) Content Approach 4.1 Literature review Based on the review of previous classification literature, we have identified one base (main) content
classification framework that consists out of information and entertainment/transaction. The other
content approach splits entertainment and transaction into two separated content variables. Compared
to the other classification approaches, these two models look at what content is provided by the messages
as a starting point. A literature review of the base content approach and the content approach can be
found in Table 3.
Research from Lee et al. (2018) already came up with a standard two feature classification framework.
Brand personality-related variables which consist out of emotions, humour, small talks etc. and directly
informative variables which consist out of mentioning deals, price or products in the BP. Although this
is a suitable framework, it has some limitations. First of all, the framework from Lee et al. (2018) does
not make a distinction between informational posts and transactional posts. Posts who want to stimulate
transactions between the company and the consumers by mentioning deals, sweepstakes or price
discount, are all stored under the informational posts. Secondly, brand personality-related posts take the
emotional side of posts more into account, which is more related to how it is being said. As mentioned
before, our classification approaches focus on what is being said by looking what content is provided
in the messages. That’s why we have come up with a new base classification approach who classifies
posts into information and entertainment/transaction. This classification is aligned with previous
advertising applications of banners. Still, the research from Lee et al. (2018) assisted to our research in
setting up the different meanings of the categories of the approaches used in our study.
Table 3: Literature Review: (Base) Content Approach
The study from de Vries et al. (2012) has analysed the impact of different post types on BPP. The
framework consisted among other features out of informational content and entertainment content,
which are posts that are perceived as fun and excited to read (Figure 3). Still, this framework lacked the
Entertainment Transaction
Cvijikj & Michahelles (2013) information entertainment remunerationTafesse (2015) informational content entertaining content transactional content
Lee et al. (2018)directly informative (brand mention, price, product mention)
brand personality-related (holiday mention and humor used)
directly informative (deals, price compare, discounts etc.)
de Vries et al. (2012) informational content entertaining content /Stephen et al. (2015) information arousal-oriented /Swani et al. (2014) information search / selling strategy (calls to purchase)Swani et al. (2017) information search / selling strategy (calls to purchase)Goh et al. (2013) content information richness / /Gummerus et al. (2012) / entertainment /Setty et al. (2014) / life events and entertainment posts /
Entertainment/transaction Authors Informational
16
possibility to classify posts about transactions or remuneration-based posts (sweepstakes, deals, bonuses
etc.). Besides this is quite a good first approach of content classification, the research did not mention
on which classification theories or previous frameworks their model was based on. It looks like the
categorization was rather a first group guess of a possible framework.
Figure 3: Conceptual Framework (de Vries et al., 2012)
A later study from Cvijikj & Michahelles (2013) has expanded the research from de Vries et al. (2012).
Taking not only the content of a BP into account, but also the time when the content should be posted
(which was a control variable in the study from de Vries et al.) (Figure 4). Secondly it added a third
variable (remuneration) to the content type category. A positive update compared to the study from de
Vries et al. (2012), is that the new framework is based on the UGT by looking at what factors drive or
motivate consumers for online engagement instead of an “at random” chosen categorization. A first look
at how these variables are related to online engagement showed that entertainment posts have the highest
level of engagement. In addition, information related posts increase the number of likes and comments.
To increase the number of comments, moderator’s shoulder make use of remuneration posts.
Figure 4: Conceptual Framework (Cvijikj & Michahelles, 2013)
17
Still we find the third category remuneration not straight forward. Is this also applicable for posts related
to loyalty programs or links for payment or does it only focus on sweepstakes? That’s why we have
changed this variable to transaction which has as a broader scope than only remuneration. Besides price
promotions, sweepstakes & loyalty (remuneration), it contributes to everything that is aiming to make a
transaction between the consumer and the brand. The study from Tafesse (2015) also made use of this
category on analysing how these different types of content influence customers responses on Facebook
posts.
Previous research from Stephen, Sciandra, & Inman (2015) has a broader range of content characteristics
compared to the study from Lee et al. (2018) and focuses on what branded content says (information &
calls to action) or how it is said (arousal- & persuasion-orientated). Two out of the four content
characteristics are aligned to our CA. First of all, the arousal-oriented content characteristic tries to affect
positive responses from consumer (positivity and humour). These characteristics are most aligned with
the entertainment category of our approach. The main difference is that the arousal characteristics looks
more at how it is said, while the entertainment characteristic looks at what is said. Secondly, the
information content characteristic refers to how much the post is associated with informational cues. It
focuses on product-related information, value-related information (value- or price-related information)
or brand-related. This category is well aligned to the information category of our content framework.
Despite the fact that this classification is a good standard approach, this framework will be less effective
against some message strategies (e.g. Where should we classify posts about brand resonance or social
cause?) (Tafesse & Wien, 2017). To overcome this limitation, we will also take the Message Strategy
Approach into our study.
4.2 Classification variables For each categorization approach, we will also give an overview of common message themes used for
each variable of the accompanying classification approach. It gives a good and quick understanding of
the categorization framework. Table 4 gives an overview of the common subjects for entertainment,
information and transaction.
Table 4: Common message themes of each classification variable: (Base) Content Approach
Variable
Entertainment Funny, humorous, humorous items, artistic works, events etc.
Information Product specifications, product reviews, product recommendations etc.
Transaction Sweepstakes, deals, bonuses, promotions, discounts, loyalty programs, links for payment etc.
Common message themes
18
4.2.1 Entertainment Entertainment is “the act of providing or being provided with amusement or enjoyment” (‘Definition of
entertainment’, n.d.). Posts with an entertainment characteristic are perceived to be fun, exciting and
cool. These kinds of posts are most of the time unrelated to the brand or a product (e.g. anecdotes,
slogan, word play, humours items, artistic works). It encourages people to contribute to the content
(Cvijikj & Michahelles, 2013; de Vries et al., 2012; Tafesse, 2015). It stimulates direct interaction
between the brand and its consumers (e.g. Q&A, survey) (Shen & Bissell, 2013). Common subjects are
movies, TV shows, series, shows etc. (Lee et al., 2018). An example of this specific type of post is
“What a lovely day, what are your plans today?”
Previous research from de Vries et al. (2012) found that entertainment posts have a negative impact on
the number of likes. This could be due to the fact that this information is unrelated to the brand and
consumers are not interested in it. But later research from Cvijikj & Michahelles (2013) found that
entertainment posts have a positive impact on the like and comments ratio compared to non-
entertainment posts. It also had the strongest impact compared to information and remuneration content
type. Brand entertainment content posts have a positive impact on the number of likes from a BP
(Tafesse, 2015). Gummerus et al. (2012) found a significant positive correlation between customers
perceived benefits of entertainment with customer satisfaction. Stephen et al. (2015) found a positive
relationship with posts who are perceived to be funny or humorous. Surprising articles have a positive
relationship to make the NYT’s most e-mailed list (Berger & Milkman, 2012).
4.2.2 Information Information is another important character of BP’s. Informational content is about product
specifications, product reviews & product recommendations (Tafesse, 2015). A post which is rich of
information (e.g. launch of a new product, new industry segment) will increase a brand fans motivations
to online contribution. Furthermore, information was found to be one of the main factors for online CE
in the form of consumption and value creation (Cvijikj & Michahelles, 2013). Research showed a
positive attitude of consumers towards informational posts (de Vries et al., 2012). Stephen et al. (2015)
had a broader scope on his information related posts which was classified into product-related, value-
related & brand-related information, where value-related information are posts that mention value- or
price-related information such as discounts or promotions. This characteristic “value-related
information” refers more to our transaction variable while product-related information fits well here.
Information BP’s enrich the brand popularity based on the like and comment ratio. But the study from
de Vries et al. (2012) showed an inconclusive effect (no effect vs. positive effect). Research from Swani
et al. (2017) found a negative relationship between information search posts (messages contain cues and
links that aim for information search) and the number of likes and comments. Posts containing the
product price or price comparison have a negative relationship with BPP (Lee et al., 2018). Information
19
about the product availability has also a negative impact on the number or likes and information as to
obtain the product has a negative impact on the number of comment (Lee et al., 2018).
4.2.3 Transaction The third variable from the first supervised model is based on transaction. This characteristic includes
everything that is linked to remuneration (e.g. sweepstakes, bonuses, promotion deals, discounts, loyalty
programs. etc.). Moreover, it refers to posts that include direct links to order and pay for a product
(Tafesse, 2015). The major focus of this post is to end with a specific transaction between the company
and one or more consumers. It has a much broader scope than the promotion variable that we will use
in VMRA. Swani et al. (2017) found a negative relationship between direct-calls-to-purchase and BPP.
5 Message Strategy Approach 5.1.1 Literature review Model 2 classifies posts into functional, experiential, emotional and brand resonance. This
categorization focuses on some traditional message strategies compared to the content focus from
previous approach (Tafesse & Wien, 2017). Still, while having another viewpoint on classification, we
distinguish the different posts on the content they provide. The framework is derived from the study
from Ashley & Tuten (2015) which conducted a content analysis of the creative (message) strategies in
SMA. It investigated which type of message companies are posting (what is their SMS?) and how these
channels and strategies are related to maximize social engagement with its consumers. More valuable
to our research, it came up with a categorization of the top creative strategies. Functional appeals are the
most common used message strategy, followed by resonance and experiential appeals. An overview of
the most common message strategies used can be found in Table 6. The category resonance from Ashley
& Tuten (2015) (which focuses on the interaction between image and words) is, in our opinion, too
vague to apply into practice. This explains why we use brand resonance as a 4th category of the MSA.
This category focuses on everything which is based on the image of the brand. It has a broader and more
defined scope which will be further explained in the next sections.
Table 5: Literature Review: Message Strategy Approach
Authors Functional Experiential Emotional Brand Resonance
Tafesse & Wien (2017) functional brand posts experiential brand posts emotional brand posts brand resonance
Ashley & Tuten (2015) functional appeals experiential appeals emotional appeals resonanceJahn & Kunz (2012) functional value hedonic value / /De Vries & Carlson (2014) functional value hedonic value / /Swani et al. (2014) functional appeals / emotional appeals brand strategy (corporate brand name &
product brand name)Swani et al. (2017) functional appeals / emotional appeals brand cue (corporate name & product name)
Davis et al. (2014) functional brand consumption / emotional brand consumption /Lee et al. (2018) / / brand personality-relatd (emotion &
emoticon)directly informative (brand mention)
Berger & Milkman (2012) / / Emotions /Tafesse (2015) / / / brand post consistency
20
Swani et al. (2014;2017) analysed how several message strategies differentiate between a B2C and a
B2B environment. Their framework consisted out of four different message strategy viewpoints: brand
strategy, message appeals, selling strategy and information search. A difference was made between
functional appeals and emotional appeals. While functional appeals refer to specific product
specifications, emotional appeals want to invoke emotions of the consumers. The brand strategy
viewpoint (which looks at the differences between mentioning the corporate brand name or mentioning
the product brand name) is most in line with the brand resonance category of or viewpoint. The main
difference is that brand resonance has a broader view than only looking at “names” but also takes history
of the brand or slogan into account.
Table 6: Message Strategy Usage (Ashley & Tuten, 2015)
Tafesse & Wien (2017) provided a framework which consisted out of 12 categories of BP’s. The 4
categories are also part of this framework. The main difference is that our study focuses on different
classification approaches (viewpoints), while the main focus of the study from Tafesse & Wien (2017)
was on building one comprehensive framework. Jahn & Kunz (2012) classified its “content” category
further into functional value and hedonic value. We do not find the overall theme “content” well suitable
for functional and hedonic since these two variables are more suitable for our MSA. Furthermore, the
other two category groups of their study (self-oriented and relationship-oriented) will be used in our
next Marketeer’s Orientation Approach. So that’s why we have adapted the content category of Jahn &
Kunz (2012) to this model and not to the previous Content Approach. Davis et al. (2014) identified five
core drivers that represent consumers’ motivation for brand consumption in a SM community. The
model consists out of functional and emotional brand consumption (which are applicable to our MSA),
self-oriented and relationship brand consumption (which are relevant for the MOA) and social brand
consumption. The study from Davis et al. (2014) will also help us to better understand the different
categories.
21
5.2 Classification variables Table 7: Common message themes of each classification variable: Message Strategy Approach
5.2.1 Functional Functional BP’s are posts that highlight the functional attributes of a company’s products and services.
These kind of posts focuses on promoting the benefits of company products and services according to
performance, quality, affordability, efficiency, design & style criteria (Tafesse & Wien, 2017).
Functional BP’s can have an internal or external orientation. Internal-oriented functional posts focus on
product attributes and benefits which they have claimed by themselves. An example “We would like to
introduce ourselves to our new computers, which have a higher processor than you could ever dream
of!” External-oriented functional posts are benefits claimed by external reviewers, which the company
would like to share with their consumers. Davis et al. (2014) found that the main drivers for consumers’
functional brand consumption are to solve problems, send specific inquiries, search for information,
evaluate service before purchasing and gain access to specific deals. De Vries & Carlson (2014) found
that a functional value of the BFP positively influences the intensity of using the BFP. Consumers were
more likely to interact online when they perceive the information as usual. Ashley & Tuten (2015),
which focused on message strategies, found that the correlation between functional appeals and
engagement score was insignificant. He defined a functional appeal as the utility or functionality of the
product or service. The research from Swani et al. (2017) found a positive (but very small) relationship
between functional appeal posts and likes but a negative relationship with the number of comments.
This kind of posts create less emotional pulses to react on the post.
5.2.2 Experiential Experiential BP’s “evoke consumers’ sensory and behavioural responses. They highlight the sensory
and embodied qualities of the brand and often associate the brand with pleasurable consumer
experiences.” (Tafesse & Wien, 2017). They are further classified into 3 subcategories. (1) Sensory
stimulation, which focuses mostly on the 5 senses. (e.g. visual, taste, odour etc.). (2) Physical stimulation
employs behavioural brand cues to amplify the physical qualities of the brand. A good example is when
Toyota posted a video of their new model combining it with footages from extreme sportsmen, enabling
Variable
Functional
Experiential
Emotional Emotion-laden language (sentiment analysis)
Brand Resonance
Common message themes
Product & service functional claims, product reviews, awards, green credentials etc.
Sensory stimulation (e.g. visual, auditory, taste, odour etc.), physical stimulation (e.g. physical actions, performances, activities etc.) & brand events (product launches, festivals, fan events, sponsored events) etc.
Brand image (e.g. brand logo, brand slogan, brand character), photos of branded products, celebrity association, brand history etc.
22
the good physical quality of the car. (3) Brand events can be events about product launches, fan events,
sport events etc. (Tafesse & Wien, 2017). Ashley & Tuten (2015) found a positive significant correlation
between experiential appeals (e.g. how the customer experience is concerned about the sight, sound,
taste, touch or smell) and engagement score.
5.2.3 Emotional Emotional BP’s want to evoke consumer emotions. Most of the time they make use of emotion-laden
language, which encourages positive or negative feelings towards the consumers. An example: “I have
a terrifying bad day!” The words “terrifying” derived from “terrify” and “bad” are both negative
emotionally charged (Tafesse & Wien, 2017). Besides emoting-laden language, Tafesse & Wien (2017)
also referred to emotional storytelling and humour related posts as emotional posts. One of the 5 drivers
from Davis et al. (2014) for connecting to a brand is emotional brand consumption, which focuses on
enjoyable interactions (e.g. feeling privileged, recognized by the brand or satisfaction of curiosity).
Ashley & Tuten (2015) found a negative correlation between the Engagement Score and emotional
appeals which mainly focused on how the customer will feel it. Research from de Vries et al. (2012)
concluded that the share of positive comments has a positive effect on BPP (likes & comments) while
the share of negative comments only has a positive effect on comments (so not on likes). This is possibly
due the fact that people want to confirm other people’s opinions or disagree and counter-react to
someone’s opinion rather than liking the post. Berger & Milkman (2012) found that articles that evoke
emotions such as awe, anger, anxiety or sadness are more likely to become viral than non-emotional
articles. We may say that articles with positive or negative content go more viral. Swani et al. (2017)
found that posts containing emotional appeals have a positive impact on BPP compared to non-
emotional appeals. A possible explanation for the negative correlation between emotional appeal and
the engagement score could stem from the fact that this engagement score is coming from Engagement
dB, which is something different than BPP (likes & comments). Lee et al. (2018) found a positive effect
of posts that represent emotions on CE (likes & comments). So, if a BP is emotional, the motivation of
a fan to participate on the content for a brand is met.
5.2.4 Brand Resonance Brand resonance posts “are posts that direct attention to the brand promise and identify of the focal
brand” (Tafesse & Wien, 2017, p.9). The main focus is on brand image, brand personality, brand
association and branded products with the main goal to influence consumers’ brand attitude. Brand
image posts include the brand slogan, logo, brand name, aesthetic features, values or characteristics
(Tafesse, 2015). Red Bull, for example, utilize their campaign slogan “Gives you wings” a lot in their
posts. The second variant shows photos of branded products. BMW that posted a close-up of their new
model is an example of this. The third approach includes posts involving celebrities and influencers.
When we think about Nespresso, we immediately link it to George Clooney. Nespresso uses this
association a lot when posting new feeds. A last possible variant involves post about the brand history
23
(Tafesse & Wien, 2017). Messages containing their corporate brand name have a positive relationship
with the number of comments but a negative relationship with the number of likes (Swani et al., 2017).
Lee et al. (2018) confirmed the negative relationship between post containing specific brand or
organization name and the number of likes but also found a negative relationship with the number of
comments. Tafesse (2015) found a positive relationship between BP consistency and audience response
(number of likes & shares of a BP). Tafesse (2015) referred to brand consistency as developing a uniform
organisational identity with a consistent brand position by making use of its brands name, logo, slogan,
values & aesthetic features in the BP’s. So brand consistency and brand resonance can be seen as
synonyms, while posts who contain their brand name are only a small sub division of the brand
resonance category.
6 Marketeer’s Orientation Approach 6.1 Literature review The third approach classifies BP’s in task-oriented content posts, relationship-oriented content posts
(social & brand interaction) & self-oriented (self-concept) content posts. This classification approach is
derived from the research from Kim et al. (2015), which focused on the marketeer’s perception of
customer’s engagement. The categorization is based on the salesmanship literature. It looks at which
different orientation viewpoints of communication a salesperson can take. Firstly, a salesperson can
make use of task-oriented communication which is highly goal-oriented. Secondly, salespersons can
focus on socializing and building personal relationships through interaction-oriented communication.
Thirdly, salespersons who use self-oriented communication make use of personal attributes or
experiences while communicating with others. Kim et al. (2015) adopted this salesmanship viewpoint
to his marketing viewpoint on social BP’s.
Table 8: Literature Review: Marketeer's Orientation Approach
This approach is similar with previous research that focused on consumers’ perceived scheme through
the U&G theory which classified BP’s in content-oriented posts (functional & hedonic: fun &
enjoyment), relationship-oriented posts (social & brand interactivity) & self-oriented posts (self-
concept) (Jahn & Kunz, 2012; Tafesse & Wien, 2017). We have not adopted the first content-oriented
category to our MOA for several reasons. In the first place, we think that the CA is a valuable approach
Authors Task-oriented Relationship/interaction-oriented Self-oriented
Kim et al. (2015) task-oriented interaction-oriented self-orientedJahn & Kunz (2012) / relationship-oriented (soial interaction value & brand
interaction value)self-oriented (self-concept value)
Davis et al. (2014) / relational brand consumption self-oriented brand consumptionSwani et al. (2017) / customer relationship /Ashley & Tuten (2015) / interactivity /Stephen et al. (2015) / calls to action /de Vries et al. (2012) / social value & co-creation value /
24
on its own as we have taken this into our study as a separate approach. Secondly, Jahn & Kunz (2012)
divided the content-oriented viewpoint into functional and hedonic value. According to our opinion, the
functional category is more suitable in the MSA compared to the MOA. Thirdly, the category hedonic
value is associated with a consumer’s perceived fun, pleasure and entertainment, which is more suitable
for the experiential category of the MSA. We can make the same conclusion for the study from de Vries
& Carlson (2014) who made use of the same classification from Jahn & Kunz (2012) and adjusted it a
little bit. They also took functional & hedonic value into their framework but did not group them together
as content-oriented. So, there is too much overlap of the content-oriented category with other
approaches. That’s why we have adopted the framework from Kim et al. (2015) into our study. Our
model will focus on what content drives marketeers to increase BPP. Another remark worth mentioning
is that the study from de Vries & Carlson (2014) and Jahn & Kunz (2012) were based out of qualitative
survey, so they did not really look at what content is stated in the post. Although the study from Tafesse
& Wien (2017) linked the study from Gummerus et al. (2012) to this approach, we would rather not link
them to each other. First of all, the research from Gummerus et al. (2012) studied the effects of
behaviours on perceived benefits and outcomes. So, it does not look at the content of the social posts
since it is a qualitative research. Secondly, the perceived benefits are social, entertainment and economic
which are not directly linked to the MOA. The entertainment perceived benefits would rather be linked
to the entertainment category of our CA.
6.2 Classification variables Table 9: Common message themes of each classification variable: Marketeer's Orientation Approach
6.2.1 Task-oriented The first characteristic that could be an explanation for marketeers’ motivation for online content
creation is based on a task-oriented viewpoint. Previous research focused more on the customer
perspective (de Vries & Carlson, 2014). Task-oriented posts want to increase sales or BPP through
traditional advertising. Advertising a certain brand or product through a persuasive message with
visuals, a new announcement about a product or service & online coupons, discounts, contests or
sweepstakes are some examples of task-oriented content. Task-oriented content was perceived to have
a significantly positive impact on the number of likes, comments and shares (Kim et al., 2015). It even
had a bigger impact on likes, comments & shares compared to interaction- and self-oriented content. In
Variable
Task-oriented
Relationship-oriented (interactivity)
Self-oriented
Common message themes
Customer feedback, links, voting, call to act, contest, quiz customer testimony, customer reviews, customer services, Q&A etc.
Friends, family, personal preferences, anecdotes and future plans etc.
Advertising, announcements new products or services, coupons, discounts, sweepstakes etc.
25
the qualitative research from Jahn & Kunz (2012), the content-oriented variable functional value was
significantly positive related to fan page usage intensity. Therefore, we suggest that task-oriented
content will have a positive relationship with BPP.
6.2.2 Relationship/Interaction-oriented Customer relationship can be defined as “posts that solicit information and feedback about customer
needs, expectations and experiences” (Tafesse & Wien, 2017, p.10). The main focus of relationship-
oriented posts is thus on social & brand interactivity (Jahn & Kunz, 2012). Tafesse (2015) used BP
interactivity in his model, which focuses on “the degree to which two or more communication parties
can act on each other” (Tafesse, 2015, p.931). Interaction-oriented content focuses on making the
relationship between customers and a brand stronger. Marketeers can post content about a personal
statement, a celebration, an opinion, the weather or entertainment. Furthermore, relationship posts can
ask for likes, comments or shares (Kim et al., 2015). Interactivity on SM is a two-way communication
between the brand and the consumers, as well as between the consumers themselves. Tafesse & Wien
(2017) further classified into 3 different categories. (1) Customers services posts which make common
service announcements and reminders. (2) Customer testimonials posts which highlight customer
previous success stories and (3) Customer feedback posts which ask through a Q&A for an opinion
about a brand product or services. Davis et al. (2014) described social brand consumption as the social
interaction between the consumers within a community (e.g. experience exchange, community
attachment, building links & social interaction) while relational brand consumption focuses on the
interaction between the brand and the consumers. (e.g. cocreation of services, desire to know the real
people behind the brand & the desire for personalized interaction with the brand). We incorporate these
two divisions into one relationship-oriented category. Ashley & Tuten (2015) described interactivity as
the degree to which consumers can actively participate and engage with the brand.
De Vries & Carlson (2014) found a positive effect of social interaction value & co-creation value of
brand posts on CE. Some previous research already analysed the impact of interactivity content on BPP
(Cvijikj & Michahelles, 2013; de Vries et al., 2012). A remark here is that these studies focused more
focused on specific elements that could appear in a post (Media Type Approach) (e.g. question mark,
photo, link etc.). Our research will have a broader view, focusing on the content itself. A question in a
post will have a higher degree of social interactivity since it encourages people to react on the post while
a link to another website will have a lower degree of interactivity (de Vries et al., 2012). Posts with a
higher degree of interactivity (e.g. contest or question) have a higher degree of enhancing BPP. An
exception is for questions which have a negative impact on likes since it encourages people to answer
on the question and not to like it. In addition, posts that containing a link have a negative effect on
comments since most of the time, people click on the link and do not come back to the specific BP’s.
This was also confirmed by Cvijikj & Michahelles (2013) where posts containing a picture or a status
had a positive effect on BPP compared to posts containing a link which has a higher factor of
26
interactivity. But our focus is on the content of the post and not on the media types of the posts. Posts
that do an effort to increase the customer-brand relationship through interactive communication will
have a higher intention to increase online engagement. Posts who are asking for engagement through
specific questions or requesting likes/comments/shares etc. have a positive impact on the number of
likes and comments (Stephen et al., 2015), while the study from Tafesse (2015) found a low negative
relationship with BP interactivity and the number of likes and shares of a BP. Interaction-oriented
content has a positive impact on the number of likes, comments and shares compared to non-interaction-
oriented posts based on the study from Kim et al. (2015). The qualitative research from Jahn & Kunz
(2012) found a positive relationship between the relationship-oriented viewpoint and the intensity to fan
page engagement.
6.2.3 Self-oriented Self-oriented content includes news, information or a story about the company or its products or an
event, program or campaign, which is sponsored by the company. It can also consist out of a media post
(video or picture) of its employees, management or staff (Kim et al., 2015). Our analysis focuses on the
viewpoint of the marketeers and not of the consumers, which gives to self-oriented content another
meaning. Still, previous research that has focused more on the consumers’ perceived intentions, helps
us to better understand the classification. Self-oriented posts (customer perspective) focus on individual
needs of an individual consumer. Jahn & Kunz (2012) concluded that there is a positive relationship
between this variable and fan page engagement. Tafesse & Wien (2017) defined personal BP’s as “posts
that center around consumers’ personal relationships, preference, and/or experience which can invoke
personally meaningful themes (family, friendship, personal anecdotes or future plans to initiate deeply
personal conversations with consumers)” (Tafesse & Wien, 2017, p.10). We will adapt the definition
from Tafesse & Wien (2017) which focused on the customer viewpoint to the marketeer’s viewpoint
since our research focuses on MGC. This has resulted in the following definition of self-oriented content.
Self-oriented content is content around the brand itself, preference, and/or experience with personal
themes (employees, staff, consumers’ relationship, management, company anecdotes or future plants).
It refers to marketeers who post about a company’s personal feeling, anecdotes or opinions. An example
of this post could be “Today, we are very happy to announce that our cousin will join our bartender’s
team!” Davis et al. (2014) classified self-oriented brand consumption further into self-actualization, self-
perception enhancement and self-branding. The study from Kim et al. (2015) found a positive
relationship between self-oriented content and the number of likes, comments & shares.
27
7 Viral Marketing Rules Approach 7.1 Literature review The VMRA categorizes posts in event, product, promotion and entertainment. This classification, based
on the viral marketing rules, is focusing on how we can increase the spread of a specific post which can
be used for different marketing objectives (e.g. product launch). An exploratory research from Shen &
Bissell (2013) made use of this classification, analysing the factors that influence brand loyalty in the
beauty industry. This approach is worth mentioning as a first evaluation. To our extend, this framework
is less suitable when you want to place it next to the previous mentioned approaches. First of all,
entertainment is also part of our content classification approach. Second, promotion is an example to the
transaction category of the CA which has a broader scope. Also, the scope of the product is too small to
our opinion. It can refer to information about a product (CA – information category) as well as to product
claims (functional - MSA). So, on behalf of the approaches we have stated and based on too much
overlap, this approach does not fit next to the other approaches. Still, this approach can be valuable on
its own for companies who want to classify their post on the viral marketing rules, ignoring the previous
approaches we have stated.
7.2 Classification variables 7.2.1 Event Table 10: Common message themes of each classification variable: Viral Marketing Rules Approach
A (current) event post “focuses on themes that capture active talking points that target audience, such
as cultural events, holidays, anniversaries, and the weather/season” (Tafesse & Wien, 2017, p.10).
Cultural events can include topics like TV Shows, film releases, sport competitions etc. Shen & Bissell’s
(2013) focus was on the sharing of a calendar, which has a broader viewpoint compared to the study
from Tafesse & Wien (2017) who did not incorporate brand events in their event category. They further
classified it in 4 time-oriented subcategories. An event from the past, today, tomorrow or in the future
can be shared. An example: “Tomorrow everybody is welcome to our annual university drink!” In our
research we will focus on the broad concept of an event, taking also brand events into the event category.
Variable Common message themes
Event Brand events (e.g. product, launches, festivals, fan events, sponsored events etc.), cultural events (e.g. sport, film, TV shows), holidays, special days (e.g. anniversary) & weather.
Product Product launch, reviews, opinions, tips.
Promotion Price discounts, coupons, discount code, giveaways, customer contests, product competitions, sample, gift with purchase.
Entertainment Funny, humorous, humorous items, artistic works, events
28
Stephen et al. (2015) found a negative relationship between posts who refer to a major or minor holiday
and the number of comments, but a positive relationship with the number of likes. A remarkable
conclusion from the study from Lee et al. (2018) is posts that mention holidays have a big negative
impact on BPP. Our event category will be much broader than only mentioning holidays. But we agree
that event posts will most of the time create happiness to people, since they can become excited for a
specific event or anniversary.
7.2.2 Product Posts categorized as product contain product-related information about a product launch or extension,
reviews, benefits, uses (how & when) or tips (Shen & Bissell, 2013; Stephen et al., 2015). An example:
“Our new model X is twice as fast compared to our previous model Y!” Stephen et al. (2015) found a
positive relationship between product posts & the number of likes & comments. Followers who receive
a message containing a product brand name are less willing to like or comment on this type of post
compared to posts who do not include the product brand name (Lee et al., 2018; Swani et al., 2017).
These studies are from later dates compared to the study from Stephen et al. (2015) who found a positive
relationship with BPP.
7.2.3 Promotion Promotion posts are based on stimulating the demand of the consumers, seduce them to take actions
towards a buying decision (e.g. giveaway, coupon/discount code, sample/gift with purchase, comparison
to competition) (Shen & Bissell, 2013; Tafesse & Wien, 2017). Sometimes these posts are equipped
with links to direct pages where they can make use of promotional offers or sign into a competition
(Ashley & Tuten, 2015). An example of a promotion post can be “tag your best friend and win a free
dinner for 2 persons!” Previous research from Cvijikj & Michahelles (2013) found that remuneration
posts have a negative impact on the like ratio but a positive impact on the comment ratio. This could be
due to the fact that if you specifically ask your followers to tag someone in the comment, people will
not like the post. Another possibility is that when the winner of a contest has been announced, the post
becomes irrelevant. Posts containing deals (discounts and freebies) have a negative relationship with
BPP (Lee et al., 2018). Stephen et al. (2015) also found that posts which contain value information (e.g.
pricing, discounts, coupons) & posts who ask for entering a competition through sweepstakes or
giveaways) have a negative impact on the number of likes & comments.
7.2.4 Entertainment We refer to model 1 entertainment, which used the same variable “entertainment” in the model.
29
8 Unsupervised Approach Previous mentioned approaches made use of a SL method to classify messages into the right category
(Ashley & Tuten, 2015; Berger & Milkman, 2012; Lee et al., 2018; Stephen et al., 2015; Swani et al.,
2014). It is an expensive, time-consuming approach but the performance will be higher compared to
UL. On the other hand, UL tries to get meaningful insights out of unstructured text data without human
involvement to label the variables lower (Netzer et al., 2012; Zhang et al., 2017). Secondly, UL is quite
new in SM content analysis, while supervised learning has been used commonly. Zhao et al. (2001)
looked at which different topics appeared on Twitter compared to the New York Times (NYT) without
checking the relationship with engagement. The study from Netzer et al. (2012) focused on
understanding the large consumer generated data through text-mining analysis and a network analysis
framework. It has modelled the role of message content and influencers in SM rebroadcasting. Trying
to reach more people, companies try to write more specific Social Media messages that are more likely
to be rebroadcasted. A follower can share a company’s post with his/her friends or retweet a “tweet” to
his/her followers, which will expand the reach of the original message. To check the underlying
dimension, the study made use of a factor analysis which was applied on a large data matrix; consisted
out of messages with a zero or one, either the word out of the word bank is included in the message or
not (De Pelsmacker & Van Kenhove, 2007). Three Factors were found with an own value greater than
1 (School: school, mba, prof etc.; Finance: equity, sector, fund etc.; Politics: tax, votes, Obama etc.).
The study concluded that rebroadcasting activity depends on the content of the message. That’s why
marketeers should focus on posting messages about topics that are more likely to be rebroadcasted.
Specific to this study, school- and politics-orientated messages are more likely to be rebroadcasted than
finance-orientated messages. Compared to the study from Zhang et al. (2017), the study from Netzer et
al. (2012) had a broader range. The application was demonstrated on building a network on sedan cars
and diabetes drugs forums. Although it is useful to see how UL has been used in content-classification,
it is not useful to compare the different classification frameworks and try to come up with one framework
since UL depends on the characteristics of the data. The three themes (school, finance and politics) are
inherent to the business school data coming from the study from Netzer et al. (2012).
30
9 Media Type Approach The last approach looks at to which extend different types of media have been used in previous
classification frameworks. Our data consist out of posts of Facebook without the possibility to see if a
picture or video is added to the post. This media approach will not be applied to our data but it is worth
mentioning how this approach has been used as well as to see if there is a “best” approach for media
classification. Secondly, our research main focus is on the different (content) approaches and not on
different media approaches. Still, it is worth mentioning how media elements have been used in order
to come with a well-established media approach for further research. Media elements have been
commonly used in previous frameworks. Some literature made use of text, photo, videos, links etc. (Kim
et al., 2015; Sabate et al., 2014) while mostly later research came up with newer terms and looked at the
interactivity and vividness of each type of media element (Cvijikj & Michahelles, 2013; L. de Vries et
al., 2012). Our MTA consists out of vividness and interactivity which will be further explained. A
literature review of the MTA can be found in Table 11.
Table 11: Literature Review: Media Type Approach
9.1 Interactivity A first way of increasing the importance of BP is interactivity. It is “the degree to which two or more
communication parties can act on each other, on the communication medium, and on the messages and
the degree to which such influences are synchronized” (Liu & Shrum, 2002, p.54). We also refer to
interactivity in the MOA. As mentioned before, the main difference is that we look at what media types
drive interactivity while relationship/interaction looks at what content can be found in the post. It is of
course clear that these variables are closely interrelated and support each other. Posts who contain
questions or links to a website have a higher degree of interactivity compared to only content posts. The
higher the possibility to get more involved in the post (links, comment options, surveys…), the higher
the interactivity value. Research showed that posts that mention a question have a negative relationship
with likes but a positive relationship with comments (de Vries et al., 2012). But a later study from
Tafesse (2015) found a negative relationship between BP interactivity as well with likes as shares.
de Vries et al. (2012) 3 levels: (1) Low: pictorial (photo or image), (2) medium: event (application at the brand page and announces and upcoming (offline) event of the brand) and (3) high: video (mainly videos from Youtube)
questions & links
Tafesse (2015) 3 levels: high (video), moderate (2 images) and low (0/1 images) 3 levels: high, moderate and low
Sabate et al. (2014) richness: images, videos and linksKim et al. (2015) text, photo or videoAshley & Tuten (2015) animation (motion)Lee et al. (2018) message type: app, link, photo, status update or videoStephen et al. (2015) rich media: images and videos & URLS's: linksCvijikj & Michahelles (2013)
4 levels: (1) photos (V = low, I = low), (2) status (V = no, I = low), (3) video (V = high, I = high) and (4) link ( v = medium, I = high)
Authors Vividness (V) Interactivity (I)
31
9.2 Vividness Another variable which has commonly been used as a media type is vividness. It is an indicator to which
degree a BP stimulates the 5 different senses. For example, a video has a higher vividness ratio compared
to a photo, since it not only stimulates sight, but also hearing. The study from de Vries et al. (2012)
classified vividness into 3 levels (low: pictorial, medium: event and high: video). A later study from
Cvijikj & Michahelles (2013) took even a further step and combined vividness and interactivity for each
type of media. 4 types of media were taken into account: (1) photos (V = low, I = low), (2) status (V =
no, I = low), (3) video (V = high, I = high) and (4) link ( v = medium, I = high). Analysis showed that
low interactive posts (i.e. photos and status updates) increase the total level of engagement while vivid
content (i.e. videos, photos and links) increase the reach of the message. The study from Sabate et al.
(2014) also focused on BPP in terms of number of likes and comments. But instead of vividness and
interactivity, this research independent variable (IV) is the richness of a BP, which takes images, videos
and links into consideration (Figure 5).
Figure 5: Conceptual Framework (Sabate et al., 2014)
Stephen et al. (2015) also used this rich media category as a media element, but took URL’s (links to
other website) as a different media element. Richness of the content in terms of images and videos
increases the impact in terms of likes while videos have no effect on the likelihood to have more
comments on a post. To increase the number of comments, marketeers can publish posts with images or
not mentioning links, since this metric has a negative influence on the number of comments in a post.
Another interesting remark is that including images seem to have a powerful impact on CE compared
to videos since they have an impact on likes as well as on comments. While the study from de Vries et
al. (2012) concluded that vividness has a positive impact on the number of likes and Cvijikj &
Michahelles (2013) concluded that vividness increases the reach of the message, the study from Stephen
et al. (2015) found that little evidence is available that media elements (Rich media which is commend
to a high vividness factor or URL’s) have no impact on CE. Same conclusion could be made for
mentioning holidays. While marketeers find media elements really important, it seems to be sometimes
ineffective to increase social engagement with the consumers. Later research from Tafesse (2015) found
32
a positive impact between brand vividness and the number of shares of a BP, but not on the number of
likes. Animation was one of the message strategies from the framework from Ashley & Tuten (2015).
Although not much information is given to what extend animation is linked to media elements, it takes
another viewpoint compared to previous research (vividness & interactivity). Since our data has no
information about media elements, we will not take this approach into our model.
10 Valence Sentiment analysis is a type of data mining that measures the sentiment of a piece of text (blogs, reviews,
newspapers, tweets, posts etc.) through natural language processing (‘Sentiment Analysis’, n.d.). It tries
to get useful insight of complex data and how posts are emotionally charged. It is a commonly used
technique in content analysis. Valence of a BP is a variable that has been used as a variable in previous
content classification research, which can be calculated through sentiment analysis (de Vries et al.,
2012). It refers to a positive, neutral or negative minded post. The study from Setty et al. (2014)
conducted a sentiment analysis (valence approach) on life event posts. Posts were classified into happy,
neutral or sad Facebook posts. To check the sentiment (polarity) of a specific word, they made use of
the Senti WordNet dictionary. Another study from Hopkins & King (2010) classified blogs based on the
American election of 2008 into a sentiment category going from extremely negative (-2) to extremely
positive (2). Berger & Milkman (2012) used positivity (the difference between the percentage of positive
words and negative words in a specific article) as a valence factor while Goh et al. (2013) used valence
as the net positivity (the number of positive concepts minus the number of negative concepts).
Emotionality defined as the percentage of words that are classified as positive or negative was another
variable from Berger & Milkman (2012). It also goes beyond mere valence to study how emotions drive
social transmission taking different emotions into account compared to previous research (de Vries et
al., 2012; Hopkins & King, 2010). Characteristics that were analysed where anger, anxiety, sadness,
awe (feeling of facing something greater than yourself), positivity & emotionality.
Sentiment analysis is mostly applied through automatically text coding. Each word is always checked
with a lexicon library (containing thousands of words) which gives a value -1, 0 or 1 which refers to a
sentiment type (negative, indifferent or positive). A second commonly used approach is based on
machine learning, which has a higher accuracy but is also more time consuming. Most of the time these
two approaches have been combined to increase performance. A variable other than valence or emotion
that has been used in previous research is emotional appeal which focuses on setting up positive or
negative emotions by specific content used in a BP (Swani et al., 2014, 2017). The difference compared
to valence is that emotional appeal had to be coded by individual people while valence can be
automatically coded by sentiment analysis. Since our research does not focus on sentiment analysis, we
will not take valence into one of our proposed frameworks. Still, it is worth mentioning how valence
have been in past literature for further research.
33
11 B2B vs B2C Next, we will give a brief overview of the different outcomes of categorization types that have been
found between B2B and B2C on BPP. Our study will focus on the B2C view since our data is coming
from bars & restaurants who have online brand pages. B2B refers to a transaction that is conducted
between companies to another company while in B2C a company is selling directly to an individual
consumer. An example of a B2B is when a company is selling parts of a car engine to a car
manufacturing company. If you are buying a new cell phone in the Fnac, we are talking about B2C. The
focus of B2C is on the customer needs, provided through products or services, while the focus of B2B
is more on improving companies’ operations through services or products to other businesses (Chen,
2019).
Figure 6: Conceptual Framework (Swani et al., 2017)
A study from Swani et al. (2017) has evaluated the popularity of SM posts in comparison of B2B to
B2C. The framework is based on SM message content strategies, which consist out of brand cue
(corporate name and product name), message appeal (functional and emotional), selling strategy and
information search, which came from his previous research in 2014. The DV is the popularity of SM
messages (likes and comments) (Figure 6). Market type is the moderator, which can positively or
negatively strengthen the relationship between SM message content strategies and the popularity of SM
messages. Control variables taken into account are the size of the Facebook fan base and the message
time (time between the message post and the storage of the post). While the B2B environment is
characterized by highly involved and rational situations (high level of cognition), the B2C view is
characterized by less involvement and more emotional triggers (low level of cognition). This is also
confirmed through his previous research in 2014 where B2B tweets have a more functional message
34
appeal while in a B2C environment, tweets have a more emotional appeal. Secondly, B2B marketeers
focus more on corporate brand strategies, but product brand strategies seem to have equal appearance in
B2B and B2C. Thirdly, direct calls to purchase (“hard sells”) are more commonly used in a B2C
environment. Finally, embedded links and cues, as well as hashtags, are more likely to be used in B2B
tweets than B2C tweets. In addition, characteristics of B2B and B2C can change over time. The results
from 2016 indicate that B2B message posts have a higher number of message likes compared to B2C
messages, but have a lower number of comments compared to B2C messages. This explains why in a
B2C environment, people are more likely to comment on message posts. Moreover, the involvement of
functional and emotional appeals, corporate brand names and information search enhance the popularity
of B2B brand posts compared to B2C brand posts (Swani et al., 2017).
12 Human classification coding Our model (which will be explained in the next section) will consist out of 5 different approaches. 4 out
of the 5 approaches (BCA, CA, MSA & MOA) make use of a SL approach which means that human
judgement is involved to classify the different variables. Before going deeper into the methodology, we
first want to give our thoughts and remarks on the human coding of first 1000 posts of our dataset. Some
categories were more straightforward to classify compared to other variables. The 4 supervised
approaches which we will use in this research have been deeply described in previous sections. We refer
to Table 2 and the literature review for a better understanding of the different variables which are used
in each classification framework. As mentioned before, we did not took the VMRA into our models
since this model has too much overlap with the other approaches. The labelled dataset will be used to
test the performance of the different classification approaches as well as to build a prediction model,
which will be used to classify all messages of our data. Table 12 (p. 35) gives an overview of how many
posts were classified of each type of variable for each approach, together with an example of a message
from our data. The coder based his classification on the literature review and the explanation of the
different variables in previous sections. Facebook messages with no messages were coded as zero for
every category. The reason some messages had no content is because they consisted out of a media type
element (e.g. sharing of a video or an image). As mentioned before, our dataset did not include the media
elements of the Facebook post.
12.1 (Base) Content Approach The Base Content Approach can be derived from the Content Approach. First of all, if the post was
coded as information or as transaction, than the post was assigned to the transaction/entertainment
category of the BCA. Secondly, since the information category is the same in BCA as in the CA, there
was no need for human coding of the information category again. 631 posts were classified as
entertainment. Since our data is coming from bars and restaurants, most of the time they post about new
35
upcoming events. Even though some previous research mentioned that entertainment is sometimes
unrelated to the brand, this was not much applicable to our data. Information posts want to share valuable
content to the customers, for example: status of the weather if it would be possible to play golf or not, a
bar who is looking for new bar tenders etc. If we would apply this category to a company who
manufactures its own products, this category would more talk about product specifications, product
reviews or product recommendations. The third and last category is transaction. Applied to our data
transaction posts, this is about the deals of drinks or food, contests to win prices, sweepstakes etc.
Table 12: Overview Human Coding Classification
12.2 Message Strategy Approach The MSA consist out of functional, experiential, emotional and brand resonance. Although this a well-
established approach on its own, we still have some remarks why we think this is not the best applicable
approach to our data. First of all, our data are coming from restaurants, bars and local sport companies.
Most of the time, they post about their specific promotions, events or sweepstakes. These kind of
companies are offering most of the time services to the customers and do not have a “product” to deliver.
This makes it difficult to classify posts into the functional categorization which mainly focuses on the
Variable Frequency Relative Frequency Message example
MD0_INF 243 24% Same as MD1_INF. MD0_TRA.ENT 575 58% MD1_ENT or MD1_TRA.
MD1_INF 243 24% ....ATTENTION GOLFERS....GOLF FOR TONIGHT HAS BEEN CANCELLED. YOU WILL NOT NOT HAVE TO MAKE IT UP, BUT IF YOUR BORED STOP DOWN AND HAVE A COLD ONE!
MD1_ENT 631 63% The neighborly bar will be hosting a Celebaration of life, for our dear friend Kevin Coffey, Thursday December 15, from 3-7, please join us and his sisters Kim and Lisa
MD1_TRA 352 35% ******CONTEST TIME****** WERE GIVING AWAY FREE 7 TICKETS ($100 VALUE) TO OUR ICE RAFFLE THE DAY OF THIS CONTEST. TO BE ENTERED INTO THE DRAWING, ALL YOU HAVE TO DO IS LIKE THE NEIGHBORLY BAR PAGE, COMMENT ON THIS POST (On the Neighborly Bar's Facebook page) YOUR FAVORITE BAIT TO USE, AND SHARE THIS POST! Last day to enter is Sunday Jan. 31ST! WINNER WILL BE ANNOUNCED MONDAY FEB. 1ST!
MD2_FUN 22 2% Dancing Saturday night to the greatest band in the northland Gypsy Road 9-1, Neighborly bar
MD2_EXP 778 78% GYPSY ROAD TONIGHT!!! 9-1. Cmon down and dance!!! MD2_EMO 68 7% a SUPER THANK YOU, to the lovely ladies who cooked for the packer game for me, I
THANK-YOU SO MUCH, for all the great food and cake, also CONGRATS to KEVIN BRAVICK, he won the trash talk raffle for $660.00!!!! what a great game and wonderful party, THANK YOU EVERYONE!!!! MD2_BRA 57 6% With a heavy heart we say goodbye to our GM ,Jed Miller today, so many great memories, we will all miss you
MD3_TAS 665 67% IF YOU SEE OUR BEAUTIFUL BARTENDERS MARIAH, AND ASHLEY, TODAY WISH THEM BOTH A HAPPY BIRTHDAY!!!
MD3_REL 98 10% WELL, ITS SUMMERTIME, TIME TO HAVE A PARTY! a BIG PARTY, BIGGER THAN THE LAST 29!!!! ITS TIME TO KICK OFF OUR 30TH YEAR WITH OUR ANNUAL SUMMER PARTY. THIS ONE IS GOING TO BE BIGGER AND BETTER THAN THE REST, AND THE PRIZES ARE AMAZING! STOP IN FOR DETAILS! PLEASE SHARE!
MD3_SEL 79 8% THANK YOU TO EVERYONE WHO ATTENDED OUR 30TH SUMMER PARTY, ALSO OUR FAMILY, FRIENDS, FOR ALL YOUR HELP, AND OUR WONDERFUL HARDWORKING BARTENDERS, THE BANDS, BOOTH WORKERS, STOCK BOYS, THE LIONS CLUB, AND MOST OF ALL----MOTHER NATURE, THANKS FOR HOLDING OUT TILL IT WAS OVER!!!
Base Content Approach
Content Approach
Message Strategy Approach
Marketeer's Orientation Approach
36
functional claims of services or products. Based on our data, these kind of companies even do not post
about their “best service claims.” We only categorized posts into the functional category if the post is
really about making a claim with regards to a great service or drinks/foods they deliver. We did not
categorize experiential posts as functional, even if they mentioned some adjective in their post which is
used to make the post more attractive. This can be seen as a functional claim but it is not the core message
of the post. Only when the main focus of the post is about a product or service functional claim, we
categorized it into functional.
The framework was mainly dominated by experiential posts which consisted of 778 of the 1000 posts
in the labelled test data. Experiential posts are mainly about the promotion of events which can cause
physical actions, they want to stimulate their customers to come to their (brand) events or just even come
by tonight for a special promotion. It is a kind of mix between entertainment (event) and transaction
(promotion) of the previous CA. Posts who congratulate someone were only classified as experiential if
they also stimulated a reaction towards the consumers, for example by saying “come and celebrate
tonight our bartenders’ birthday!” Posts were classified into the emotion category if they really wanted
to evoke emotions. For example: talking about a beautiful day, saying goodbye to someone who worked
for a long time at the bar or thanking everyone for coming to a charity event. The last category of the
MSA classifies posts into brand image. The main focus of these posts is on the direct relation to the
image of the brand. Applied to our data, it is about saying happy birthday to someone of the company,
talking about the history of the company, thanking people from the company etc.
12.3 Marketeer’s Orientation Approach The main focus of relationship-oriented posts is to end with an “interaction.” If the post is only about
the promotion of an event, then it will only be classified as task-oriented. So a post that says “come
tonight to our new event” is not seen as a real call to action but as a promotion type. Signing in for an
online contest, calls to actions or applying at the bar for a job are some examples of relationship-oriented
messages. Posts who thank persons or wish a happy birthday to people from the personnel were
classified as self-oriented posts.
12.4 Takeaways 1. Classification depend mainly on human interpretation and judgement, other people can
interpret messages in a different way and classify them into another categorization variable.
2. Classification depends on the industry of the companies.
3. The Message Strategy Approach is dominated by the experiential category, while the
marketeer’s strategy approach is dominated by task-oriented posts. This is mainly due to the
characteristics of our data coming from bars and restaurants. For example, in the
manufacturing industry, the functional category of the message strategy would be much higher,
since these kind of companies promote more their products instead of promoting events.
37
13 Methodology 13.1 Data The data, coming from Facebook, were collected over 6 years between the first of January 2011 and
30th of December 2016. In total 240.210 different posts were assembled coming from 476 different
Facebook pages. We believe that a time span of 6 years is enough and valuable to our research. For each
post the feed_id was collected (unique value) together with the time and date of the post creation
(feed_created_time), the content of the message (feed_message), the number of reactions, shares and
comments of a post (reactions_count, shares_count & comments_count), the page_name (ID of the
companies’ Facebook page) and the time and date when the post was extracted (extracted_on). Through
a random check of 10 different page ID’s, we could check out of which kind of companies our data
consisted of (Exhibit 2). All of the 10 different Facebook pages had bar as a page tag. Also a deeper
look at some random Facebook posts made it clear that our data are coming from the bar/restaurant
industry.
13.2 Model description The conceptual framework from our thesis can be found in Figure 7. Our methodology focuses on
answering the two research questions. First of all, we want to evaluate which approach is best suitable
for content classification (performance analysis). Secondly, we look at the relationship of each variable
of each content classification approach with BPP (number of reactions, shares & comments) (impact
analysis).
In total, our research takes five different approaches into account for the classification of Facebook
posts. The first model is based on the concept of topic modelling which is an unsupervised method which
automatically classifies documents into themes (‘A gentle introduction to topic modeling using R’,
2015). We have chosen for topic modelling since this is the best suitable unsupervised model technique
for message data. The other four approaches (BCA, CA, MSA & MOA) have been deeply described in
previous sections. We refer to the section of the literature review for a better understanding of the
different variables which are used in each classification framework. As mentioned before, we did not
took the VMRA into our models since this model has too much overlap with the other approaches.
First of all, we explain the unsupervised model which made use of topic modelling together with the
evaluation approach. Since this model does not make use of a labelled outcome, we can label the entire
dataset, as well as evaluate the performance of the model by applying it straight to all messages. The
approach for our supervised model is a little bit different. Here, we make a differentiation between the
performance of the model and the labelling of the entire dataset which is needed for our impact analysis.
First of all, we will apply Random Forest (RF) as a classification algorithm on the 1000 human classified
posts (labelled dataset). By doing so we can split our labelled dataset into a training and a test set which
38
is needed to evaluate the performance (performance analysis) of our supervised classification
frameworks. We will try to come up with a “best” classification approach which is suitable in SM.
Secondly, we conduct a basic impact analysis of all the posts on BPP (number of reactions, count &
comments). Since labelling 240.210 posts for every supervised classification approach would be too
much time consuming, we will build a predictive RF classification algorithm on the labelled dataset
(1000 posts) and apply this algorithm to the whole dataset (so no test set is included).
Figure 7: Conceptual Framework
Our methodology is structured as follow. First, a brief understanding is given of the basic matrix which
is needed for applying topic modelling as well as RF. Secondly, we will explain the methodology and
performance metrics of our topic model (unsupervised algorithm). Thirdly, the (supervised)
classification algorithm RF will be explained, followed by the method for the performance evaluation
of RF. The last part consists out of the methodology to test the variables of each content classification
approach on BPP (number of reactions, counts & comments).
13.3 Data preparation The basis for the topic modelling method as well as for the supervised classification algorithm consists
out of the creation of a document term matrix (DTM). A document matrix consists out of documents in
the rows which are Facebook posts, applied to our data and words who appear in each document in the
columns (Feinerer, 2018). To come to this matrix some pre-processing is needed. First of all, non-
recognizable characters and emotions were deleted from each Facebook message. Next, a corpus was
created which is a collection of text documents which allow us to clean the documents (ex. remove
numbers, remove punctuations, remove stop words, stem the messages etc.). Terms needed to be in at
least in 0,5% of the Facebook posts to stay in the matrix to get rid of unnecessary words.
Unsupervised Classification Supervised Classification
Topic Approach Base ContentApproach
Content Approach Message Strategy Approach
Marketeer's Strategy Approach
Information
Transaction
Entertainment
Information
Entertainment/ Transaction
Functional
Experiential
Task-oriented
Relationship/ Intereaction-
orientedEmotional
Self-orientedBrand Resonance
Number of comments
Food
Performance
Bra
ndP
ost P
opul
arit
y
Party
Number of reactions
Number of shares
Control V
ariables
Number of words
Weekend day
Performance analysis(evalaution of classification approaches)
Impact analysis
39
13.4 Unsupervised algorithm Our first model is based on UL which requires no human involvement to label the brands. Through this
technique we want to capture the underlying structure or dimensions in the data without knowing
beforehand the corresponding labelled output. Even though previous research (Netzer et al., 2012) made
use of factor analysis, we will make use of topic modelling for our unsupervised classification approach.
It is a method which is specifically developed to apply for content while factor analysis is more used in
survey data. Topic modelling looks for “topics” in the collection of different posts and discovers hidden
structure in the data. One of the main advantages of using topic is that it allows words to overlap over
the different topics, compared to hard clustering methods. We will make use of Latent Dirichlet
allocation (LDA) which is a “probabilistic model of a collection of composites made up of part” (‘Your
Easy Guide to Latent Dirichlet Allocation’, 2018). Applied to our data, each Facebook message is stored
as a separate document (composite) which consists out of different topics (parts) and each topic consists
out of a bag of words (Silge & Robinson, 2017). The main concept to understand topic modelling is that
the goal is to derivate the hidden structure in the data, given the words and documents. This is done by
an iterative process which mainly focuses on recreating each document by adjusting the relative
importance of words in topics and topics in documents till a best “topic structure is found.”
To find the optimal numbers of topics, we make use the ldatuning package from Murzintcev (2019).
The method trains multiple LDA models with k topics at once. To evaluate the model we will make use
of the simple approach which states that the best number of topics is where the metrics Arun2010 and
CaoJuan2009 are minimized and the metrics Deveaud2014 and Griffiths2004 are maximized. Since our
supervised classification approaches range between 2 and 4 categorization types, we do not want to have
too many topics and let the parameter k ranges between 2 and 5.
To further evaluate the model, we will first look at the probability of each word being generated from a
topic. This can be evaluated by looking at the “per-topic-per-word probability, called β (“Beta”). To get
a better understanding of the content of each topic, we will look at the top 10 terms which are most
common of each topic, look if the different topics make sense and try to come up with an overarching
theme. Secondly, LDA also estimates each document as a mixture of topics which can be evaluated by
looking at the per-document-per-topic probabilities, called γ (“Gamma”).
13.5 Supervised algorithm Our next section focuses on the classification algorithm used for the SL models. We make use of RF
together with a Singular Value Decomposition (SVD) step to reduce the number of columns. As
mentioned before, we will apply two times RF, one time to check the performance of our model, and a
second time to label our entire dataset.
40
The following 4 steps are repeated for each variable of a classification approach before RF is applied
(for testing the performance of the test data, as well as for labelling the entire dataset).
1. Make a Corpus for training and test entire data.
2. Create training and test/ entire data with n-grams.
3. Make DTM from test and training/ entire data so they have the same terms.
4. Convert DTM to common sparse matrices and apply Singular Value Decomposition.
13.5.1 Singular Value Decomposition SVD, also known as Latent semantic indexing. It is a widely used technique to reduce the number of
columns (Klema & Laub, 1980). It states that every m x n matrix can be reformulated by
With U: the left singular vectors (orthogonal matrix, m x r), !: the singular values (non-negative
rectangular diagonal matrix, r x r) and V: the right singular vectors (orthogonal matrix, r x n). Also
∑=diag(σ1,…,σr), r=min(m,n), with σ1≥…≥σr≥0 with σi the singular values. We will make use of the
irlba package from Lewis (2019) and set the number of vectors equal to 50. This will result in a matrix
with the concept loadings of each message on a specific vector which will be used as our new matrix
where we will apply RF on. The reason why we mainly apply this technique is to reduce the complexity
of our DTM matrix in order to reduce the number of columns which represent the words who appear in
the messages.
13.5.2 Random Forest RF is an ensemble of decision trees which are trained with a bagging method and random variable
selection (Breiman, 2001). The concept of bagging is that successive trees do not depend on earlier trees.
Each tree in the forest is constructed by using a bootstrap sample (i.e. sampling with replacement) and
the prediction of a variable is based on the majority vote (Liaw & Wiener, 2002). Bagging improves the
stability and reduces variance and accuracy while boosting mainly reduces bias and variance. So in other
words: RF builds multiple decision trees and merges them together to increase performance and to get
a more stable prediction (Donges, 2018). Binary Recursive Partitioning algorithm is used to construct
the trees from the Forest. At each split, it searches for the best possible split to create a binary partitioning
of the data by taking a randomly chosen subset of the predictor variables (Donges, 2018). In practice,
the best split " is the one that maximizes the decrease of impurity of the parent node τ, as measured by
the Gini index #(1 − #) (Berk, 2007) as follows:
41
with τ the cases in the parent node, $% the left child node cases, $& the right child node cases, p short
for #(' = 1) with ' ∈ {0,1} and | . | representing cardinality (number of elements of the set). Finally,
predictions can then be made by aggregating the predictions of the trees based on the concept of majority
votes. The model iterations can be summarized by the following steps (Liaw & Wiener, 2002):
1. Draw n tree bootstrap samples from the original dataset.
2. For each bootstrap sample, grow un-pruned tree by choosing the best split based on a random
subset of mtry predictors at each node with mtry = the number of predictors.
3. Predict new data using majority votes.
RF has some advantages above other classification algorithms. First of all, it is a user-friendly algorithm,
only two parameters should be set: the number of trees and the number of variables tried at each split.
We will follow the standard guidelines from Breiman (2001) and set the number of trees high (1001)
and the root square of the number of variables as candidates for each split (mtry = default (p)). Since we
will apply SVD to reduce the number of columns (words) in our document matrix, p will equal the
number of vectors. Secondly, RF is very robust to overfitting (occurs when a too complex model is made
to generalize the information from the training data onto the test data) and it can deal with a large number
of features since there are a lot of trees in the forest (Breiman, 2001; Donges, 2018). This is important
to our model since we have a lot of predictors (words) to be evaluated. We will use the random Forest
package by Liaw & Wiener (2012) to apply this algorithm in R studio.
13.5.3 Performance To evaluate the performance of our different classification approaches we will make use of cut-off
independent measures. This measure represent the probability that a given event will take place.
Specified to our data this means the probability that a given message will be a classification variable
(ex. Emotional oriented or information). The evaluation is highly sensitive to the chosen cut-off value.
Therefore, the performance of the model is evaluated by the Area Under the Receiver Operating
Characteristic Curve (AUC). The calculation of AUC is based on comparison between the predicted
status of an event and the actual status of an event for all possible cut-off measures between 0 and 1
(Larivière & Van Den Poel, 2005). The AUC is defined as follow:
With TP: True Positives, FN: False Negatives, FP: False Positives, TN: True Negatives, P: Positives
and N: Negatives. The AUC can graphically be represented by plotting the true positive rate against the
false negative rate which is equal to plotting the specificity (TN/N) against one minus sensitivity (TP/P)
across all possible ranges of thresholds (between zero and one). The area under this curve represents the
metric measure and is used to evaluate the accuracy of the model (Hanley & McNeil, 1982). The values
42
of the AUC can range between 0.5 and 1. A value equal to 1 means that the model did a perfect prediction
while a value equal to 0.5 states that the model does not better than a random selection. The AUC will
be our main evaluation metric to test on each type of variable from each classification approach.
We can also look at the variable importance for the predictions to see which variables have the most
predictive power. This can be calculated by the Mean Decrease in Accuracy (MDA) and the Mean
Decrease in Gini (MDG). The MDA is calculated by taking the difference between the accuracy when
the values are scrambled for each tree (and averaged to get a single value) and the Out-of-Bag (OOB)
accuracy of the model. The OOB (the data not in the bootstrap sample at each bootstrap iteration)
accuracy is measured by the proportion instances correctly classified, compared to the total amount of
instances (Liaw & Wiener, 2002). The MDG is equal to the average of a variable’s total decrease in
node impurity over all trees in the Forest which is measured by the Gini index. Impurity is calculated
for each child node and compared with the original node (‘Random Forests’, n.d.). Variables who split
labelled nodes into pure single nodes will have a higher MDG, which makes them a more important
variable. These measures will be less important for our study since we concentrate on the overall
performance of the different models. Secondly, we used SVD to reduce the number of columns, so these
metrics will show the importance of the vectors and not of the words.
13.6 Basic impact analysis The last part of our research focuses on the relationship between the different classification approaches
and BPP (number of reactions, comments and shares). Compared to previous sections which focused
only on the 1000 labelled BP’s to evaluate the performance of an approach, the impact analysis will
make use of the entire dataset (240 210 message posts). Our DV’s are count data with a Poisson
distribution, with a Lambda lower or equal to zero since the number of observations over the number of
likes, comments or shares exponential decrease. Exhibit 3 plots the count of the DV’s, taking also the
difference between the information variable into account. A lot of BP’s produce no reactions, comments
or shares while only a few posts have high count DV’s. The lambda parameter sets the curve of the DV
and is equal to the average mean and variance of the count variable (Cameron & Trivedi, 2005).
Secondly, our DV’s deal with over-dispersed count data since the (conditional) variance is higher
compared to the (conditional) mean (e.g. the number of counts M = 0.8, SD = 7.48). Since our DV’s
have a highly left skewed distribution and the residual errors do not follow a normal distribution,
applying a normal linear regression is not suitable to our data. To overcome this problem we will make
use of two adjusted regression models.
43
Ordinary Least Squares Regression (OLSR) - Our first model will make use of OLSR where the log
is taken from the DV + 1 (taking the log of zero is undefined). Secondly the log will also be taken from
the length of words (control variable) since this variable also has an exponential decreasing distribution.
OLSR makes use of minimizing the sum of squared residuals (the difference between the observed
response value and the predicted value of the model) (Gareth, Witten, Hastie, & Tibshirani, 2017). We
will use a stepwise mixed method for the selection of the variables of the model. Still, this model lacks
the capacity to model the dispersion. A F-statistic will be applied to check whether the applied OLSR
model is significant or not (it checks if at least one of the regression coefficients is not zero). R square
tells us more about the accuracy of the model, measuring the proportion of variance in the DV explained
by the IV’s. More specific, we will look at the adjusted R square which adjust R square to the number
of parameters taken into account (Gareth et al., 2017)
Negative binomial regression (NBR) – A NBR can be used when data is highly over-dispersed, which
is still an issue when using OLSR. Compared to a Poisson regression (which assumes that the variance
is equal to the mean), it has an extra parameter to model for over-dispersion which can account to the
higher variance compared to the mean of the count data (‘NEGATIVE BINOMIAL REGRESSION | R
DATA ANALYSIS EXAMPLES’, n.d.). The model is based on finding β coefficients for the different
variables by maximizing the likelihood function. A Chi square test will be applied to test the overall
significance of the model, checking if there is a difference between the residual deviance and the null
deviance of the specific NBR model. The test checks if the lack of fit of the model reduces by taking
more variables into the model than only the intercept model. Secondly, we will also look at the Akaike
Information Criterion (AIC) of the model which is another parameter of measuring the fit of the model
(‘NEGATIVE BINOMIAL REGRESSION | R DATA ANALYSIS EXAMPLES’, n.d.). AIC estimates
the relative quality loss for a given statistical model. When different models are used for the same data
with the same DV, the model with the lowest AIC is considered as the “best” model (Arnold, 2010).
Table 13: Different models of impact approaches
To check the relationship between each categorization variable of the different approaches and BPP, we
will use the following procedure. First of all, we will build an overall model for each DV (reactions,
comments or likes), taking all the different classification variables into account (Multi Approaches
OLSR NBR OLSR NBR OLSR NBRx x x x x x
Topic Model * x x x x x xBase Content Approach * x x x x x xContent Approach * x x x x x xMessage Strategy Approach * x x x x x xMarketeer's Orientation Approach * x x x x x x
*Each approach also takes the 2 control variables into the model (number of words & weekendday)
Reactions Shares Comments
Multi Approaches model *Isolated models
44
model) (MAM). Still, since information is a category in the BCA as well as in the CA, we will remove
the information variable from the CA in the MAM to overcome overlap. By looking at the MAM, we
assume that there is not much overlap over the other classification types. Secondly, to overcome this
assumption, we will also test each approach independently (Isolated model) (IM) on the number of
reactions, likes and comments. Table 13 (p.43) gives an overview of the different models which will be
tested.
13.6.1 Independent variables Hopkins & King (2010) stated that only a few hundred of documents needs to be hand coded on training
data and to use this set to further apply the labelling on the larger population. It showed that at around
500, the extra performance starts to be insufficient. Thus coding more than about 500 documents in your
training data seems to be inadequate when you are under time pressure. However, to increase the
performance of our model, we manually coded 1000 Facebook posts per category. We will build a
prediction model for each variable of the approaches who are based on supervised learning. Next, we
can apply the model to our entire dataset. Compared to previous section which focused on the evaluation
of the different approaches and made use of cut-off independent measure, we want to have a
deterministic binary prediction whether a message will belong to a category or not {0,1}. Our RF model
will make use of single cut-off measure to evaluate deterministic binary classification. The outcome of
the RF will be 12 times a categorical factor variable consisting out of 2 levels (e.g. MD0_INF will be
equal to one if the message contains information, zero otherwise).
Table 14: Independent variables
Variable Description
TM_1 Dummy variable = 1 if content of the message contains the party topic, 0 otherwise. TM_2 Dummy variable = 1 if content of the message contains the food topic, 0 otherwise. TM_3 Dummy variable = 1 if content of the message contains the performance topic, 0 otherwise.
MD0_INF Dummy variable = 1 if content of the message contains information, 0 otherwise. MD0_TRA/ENT Dummy variable = 1 if content of the message contains transaction/entertainment, 0 otherwise.
MD1_INF Dummy variable = 1 if content of the message contains information, 0 otherwise. MD1_ENT Dummy variable = 1 if content of the message contains entertainment, 0 otherwise. MD1_TRA Dummy variable = 1 if content of the message contains transaction, 0 otherwise.
MD2_FUN Dummy variable = 1 if content of the message contains functional, 0 otherwise. MD2_EXP Dummy variable = 1 if content of the message contains experiential, 0 otherwise. MD2_EMO Dummy variable = 1 if content of the message contains emotional, 0 otherwise. MD2_BRA Dummy variable = 1 if content of the message contains brand resoance, 0 otherwise.
MD3_TAS Dummy variable = 1 if content of the message is task-oriented, 0 otherwise. MD3_REL Dummy variable = 1 if content of the message is relationship/interaction-oriented, 0 otherwise. MD3_SEL Dummy variable = 1 if content of the message is self-oriened, 0 otherwise.
feed_word_length Numer of words of the message. weekendday Dummy variable = 1 if message is created during a weekend day (Saturday or Sunday), 0 otherwise.
Control variables
Topic Approach
Marketeer's Orientation Approach
Base Content Approach
Message Strategy Approach
Content Approach
45
Timing of a post has been commonly taken into account as a control variable. Cvijikj & Michahelles
(2013) found a positive relationship between a post created during the week and the number of
comments, while de Vries et al. (2012) found no signification relationship. We will take the day of the
week (is the post created in the week or in the weekend) into account as a control variable. Also
advertising literature suggested that message length or the number of words effect BFP (de Vries et al.,
2012), therefore we will also take the number of words for a BP into our model as a control variable.
13.6.2 Dependent variables BPP or CE has been commonly used in SM studies as a DV. BPP consists most of the time out of the
number of likes and the number of comments for a specific post (de Vries et al., 2012; Sabate et al.,
2014; Shen & Bissell, 2013; Swani et al., 2017). How many times a post is shared on SM is another
variable which has been used as a DV (Cvijikj & Michahelles, 2013; Kim et al., 2015; Tafesse, 2015).
Compared to the previous research from de Vries et al. (2012), where the number of likes and comments
were in absolute value, the studies from Cvijikj & Michahelles (2013) and Kim et al. (2015) made use
of the like ratio, comment ratio and shares ratio as the DV which adjust the variable to the number of
brand fans. Lee et al. (2018) also added the number of click-throughs as a DV of the online engagement
score. To operationalize BPP which stimulates online CE, we will make use of the following three
dependent variables: the number of reactions, shares & comments.
Companies who make efficient use of their Facebook brand pages through a well-established online
marketing strategy, can increase their customer relationship. Moreover, followers who react on a post
or share a post feel more contributed to the company. Nowadays, people who contribute to the online
engagement of a specific brand page can earn even the “super fan” label (Porterfield, 2011). The sharing
of Facebook posts even large the reach of the message which can result in new potential customers. Our
first DV is the number of reactions while previous research mentioned the number of likes.
Figure 8: Reactions possibilities on a Facebook post (Krug, 2016)
In the month of February 2016 Facebook launched its reaction buttons to the world which is an extension
of the liking button, giving you the opportunity to react fast and easy in more (Krug, 2016). Besides the
like button, you also have the possibility to click on the love, haha, wow, sad or angry reaction (Figure
8). Since our data was collected between the beginning of 2011 and the end of 2016, some posts will
also contain other reactions than only likes. However most of the reactions will still be likes since the
other 5 reactions were not available the first 5 years of our data collection. A random look at the reactions
of post from our data, after the 6 reactions possibilities were launched, made it clear that the like button
46
was only used on our data. Secondly, since our data is mostly coming from restaurants and bars who
share about their specific events and promotions, we consider that the sad and angry button have not
been used a lot. We will assume that the number of reactions is still equal to the number of likes in our
study. The number of shares of a brand message means how many persons have reposted the brand posts
on their own personal page. The number of comments refer to how many comments in total the message
has.
14 Results 14.1 Descriptive statistics An overall review of the descriptive statistics can be found in Exhibit 4. To get a first impression of our
data, we looked at which words appear the most in the messages of the brand pages to get a better
understanding of the content brand pages post about. More specific, which words are mostly used in the
bar/restaurant industry. The five mostly used words are tonight (N = 42 823), come (N = 42 609), night
(N = 40 106), will (N = 35 914) and day (N = 30 086). To get an even better understanding of our data,
we first made a condition to take into account only words who appear in at least 3% of all posts.
Secondly, we looked at some words who have less value to understand the content of the messages (e.g.
come, get, will, see etc.). Exhibit 4 contains 3 word clouds (1st one: no conditions, 2nd one: 3% condition,
third one: 3% condition and less meaning words condition). Based on the word clouds, we can see that
most of the posts contain information about the timing of events (tonight, night, hour, Friday, weekend,
day etc.), bars (drink, beer, music, band, music etc.) and promotions (free, special etc.).
Companies are using different messages to stimulate Facebook. With regards to the topics, posts about
party (topic 1, 40 % of total) were most frequently used, followed by performance (topic 3, 36% of total)
and food (topic 2, 7% of total). The variable names of the topics will be further explained in the next
section. In terms of our base model, posts providing transaction and entertainment content were mostly
common (148.635 occurrences, 62% of total). Concerning the CA, entertainment posts appeared the
most (54%), followed by information content posts (20%). Only a few posts were classified as
transaction (7 375 occurrences, 3%). Messages containing experiential content dominate the Message
Strategy Approach (66%) while the other 3 occurred less than 1% of the total posts (functional: 0.01%,
emotional: 0.07% and brand resonance: 0.10%). As mentioned before, this comes from the fact that our
data is coming from restaurants and bars who post mostly experiential (events) content. In terms of the
MOA, task-oriented posts were most frequently used (150 286 occurrences, 63%) followed by
relationship/task-oriented posts (0.49%) and self-oriented posts (0.40%).
47
To get a first insight on how BPP is perceived on Facebook, we looked at the descriptive statistics of
the three dependent variables (number of reactions, number of comments and the number of shares). In
general, fans engage online with the brand in the form of automated reactions (M = 7.598, SD = 32.42)
more frequently compared to comments (M = 0.803, SD = 7.41) and reactions (M = 1.238, SD = 30.82).
Most BP’s were placed on a Friday while Sunday is the day the least posts are published. This is due to
the fact that bars and restaurants promote their event on Friday evening to start the weekend, while
Sunday is mostly a hangover day. A little bit more than 25% of the brand messages were posted during
the weekends. A Facebook post contains on average around 150 characters (M = 145.8, SD = 263.04)
and consists of around 25 words (M = 24.91, SD = 42.75).
14.2 Topic model The document matrix which was described in previous section was the basis for our topic modelling
(UL approach). Words who did not appear in at least 0.5% of the Facebook posts were removed to get
rid of un-useful words. This reduced the columns which represent words from 80 574 to only 482
important terms.
Figure 9: Optimal number of topics
Based on the ldatuning package from Murzintcev (2019), we came up with 3 topics (Figure 9). The
Griffiths2004 metric and Arun2010 are not informative to look at, since they go from zero to one or
from one to zero and have not got fluctuations of the number of topics. If we look at the other 2 metrics,
we can see that the CaoJuan2009 metric is minimized at 3 topics and the Deveaud2014 metric is
maximized at 3 topics. In practice, the number of topics is mostly bigger (Silge & Robinson, 2017).
Since our research want to compare the different models, we wanted to limit the number of possible
topics. A model which would consist out of +- 100 models will be too big to analyse and to see the
overarching concept of topic modelling. To get a first understanding of the different topics, we wanted
to give a general name to each topic. By asking people which word comes in mind when showing the
48
10 most common words of each topic (Figure 10), we came to the following topic names. (1) PARTY:
words that came in mind when showing the common terms of topic 1 to people were friends, leisure,
activity, planning, events, opening of bars and restaurants. To our concern, the best overall topic name
is party, since the topic talks mainly about opening hours, promotions and drinks. (2) FOOD: some
words which are commonly used in topic 2 are cheese, chicken, salad and onion which are food
ingredients. Secondly it also talks about some food promotions (today & special). (3)
PERFORMANCE: the third topic talks about performance (concerts & gigs). The top 10 terms of a
topic, are the words with the highest word-topic probabilities, measured by the β parameter (Figure 10).
For each possible combination of a word and a topic, the β coefficient is equal to the probability of that
term being generated from that topic (Silge & Robinson, 2017). For example, the first word from our
term document matrix “acoust, derived from acoustic.” The word reminds us of sound and hearing.
That’s why we would match this word mostly with performance, a little bit with party and least with
food. If we further look at the beta value for acoust with all 3 topics, we can confirm this statement. The
term has a 1.653566e-03 probability of being generated from the performance topic, while it has only a
4.219594e-32 probability of being generated from the food topic.
Figure 10: Top 10 terms of each topic
Besides looking at the β parameter, we can also look at the document-topic probabilities by analysing
the γ parameter. It gives us the estimated proportion of words from a specific message that are generated
from topic 1, 2 or 3. A message post from our data has the following content: “Make plans to join us
Saturday from 3-9 for the Pooper Party, great band from superior wi, playing oldies, country, and good
ole rock and roll, lunch, snacks, hats, horns, champagne, plus win cash and prizes, bring on 2017 a bit
early this year plus another party at midnight for all.” About 41% of the words were generated from
49
topic 1 (party), 7% of topic 2 (food) and 52% of topic 3 (performance). Based on the content of the
message, we can confirm that this message is more about party and performance than food. We can
conclude that our topic model is quite a good model based on our results. First of all, it made sense to
come up with 3 different themes when we looked at the 10 most common words of each topic. Secondly,
a closer look at the β and γ coefficients of our model confirmed that our 3 topics are representative.
14.3 Random Forest As mentioned before, four classification approaches were built. Since the focus of this section is on the
performance of our different classification approaches, we will mainly concentrate on the AUC’s of
each variable of each classification framework. AUC measures the performance of a binary classifier,
consequently we applied RF for every variable of a proposed classification (model) approach. In total
12 times a RF model was built to test the prediction efficiency of each variable. Still, a deeper analysis
is given for the first variable of the BCA by looking at the OOB-error, confusion matrix and the variable
importances. A tuning approach is also applied on the first approach to see if improvement of the
information model is possible. The labelled dataset was used to test the performance of the model and
was split in a random training set of 800 posts and the other 200 posts were used a test set. Since our
model consists out of 12 times applying RF for each variable, we will give only once a deeper analysis
of RF applied on the information model. Our research focuses on the comparison of the different
classification approaches which makes AUC the number one evaluation metric for binary classification.
14.3.1 Information model Our first model wants to classify our messages into information and non-information. The two
parameters of RF were set at 1001 (number of trees) and 7 (variables tried at each split, this number is
equal to the root square of 50, which is the total number of vectors of our base matrix). The OOB error
is equal to the mean of the prediction error using data not in the bootstrap sample, for each bootstrap
iteration and related tree (Liaw & Wiener, 2002). The OOB error for our information model is equal to
20.15%. The result of this OOB error rate is on the boundary of acceptable, but we will look if we
decrease the rate by tuning our RF model. The class error of non-information is very good with a class
error rate of 3%, but the class error rate is too high for the prediction of information (74%). 51 messages
were correctly classified as information (TP), while 143 messages were incorrectly classified as non-
information (FN). The accuracy of the test data is equal to 83% which confirms the good AUC of our
model. In order to understand which vectors of the SVD drives the results, we looked at the variable
importances which are analysed by the MDG and the MDA. For both metrics v1,v10 and v32 are the
most important variables (vectors) of our model (Exhibit 5). As mentioned before, this is less useful to
our thesis since we focus on the performance of the different classification approaches. Secondly, we
have reduced the number of words to vectors by making use of SVD, which has less useful meaning.
50
Tuning of the model did not improve the OOB error rate. As a result, we will keep the original prediction
model with 1001 trees.
14.3.2 Evaluation supervised approaches In what follows, we will evaluate and compare the different supervised classification approaches. As
mentioned before, we look at the AUC values (Table 15) and at the AUROC-curves (Figure 11) of each
classification model, since this is the most important metric used for checking the performance of
classification labelling. The AUC for the information variable and the transaction/entertainment variable
from the Base Content Approach are quite good, 0.82 and 0.72 respectively. If we look at the Content
Approach, who separated the transaction and entertainment category compared to the BCA, we get a
AUC of 0.87 for the entertainment model and a AUC of 0.83 for the transaction model, which is very
good. The AUC of the information variable of the CA is 0.78 which is a little bit less compared to AUC
of information of the BCA. The little deviation is due the fact that for each variable model a new random
training and test set was made. Splitting transaction/entertainment into transaction and entertainment
have made them become a better prediction variable (Figure 11(a) and Figure 11(b)). The AUC curve
of entertainment and transaction of the CA has become higher compared to information while the
information AUC curve was higher than the transaction/entertainment variable in BCA. Table 15: AUC's of the different classification variables
If we look at the AUC values of the Message Strategy Approach who classifies message posts into
functional (AUC = 0.53), experiential (AUC = 0.93), emotional (0.72) and brand resonance (0.86), we
see a low value for the functional variable. As mentioned before, a value lower than 0.5 can point out
that the model may be overfitted. Out of the 1000 posts who were manually coded, only 22 posts where
coded as functional. So the low AUC can be due to the low values in our training data which made our
model too strict to classify messages into the functional category. Secondly, the AUC curve of the
functional model sometimes goes under the straight line which means that a randomly chosen
classification would perform better than our model (Figure 11(c)). Thirdly, as stated in the human
Variable AUC
MDO_INF 0,82 MD0_TRA.ENT 0,72
MD1_INF 0,78 MD1_ENT 0,87 MD1_TRA 0,83
MD2_FUN 0,53 MD2_EXP 0,93 MD2_EMO 0,72 MD2_BRA 0,86
MD3_TAS 0,83 MD3_REL 0,77 MD3_SEL 0,91
Base Content Approach
Content Approach
Message Strategy Approach
Marketeer's Orientation Approach
51
classification section, we mentioned that the functional variable is not the best suitable variable to use
on our data coming from bars and restaurants, which is confirmed by the low AUC. Besides the low
performance of functional, the other variables perform very well. Even the model for brand resonance,
which only consisted out of 6% (57 posts) of the labelled dataset, has a good prediction performance
(AUC = 0.86). Our last approach (Marketeer’s Orientation Approach) is also a good approach for
message classification since the AUC’s are quite high (table 15). Self-oriented posts, who only occurred
in 8% of the labelled data, have the best classification performance followed by relationship/interaction-
oriented messages and task-oriented messages (Figure 11(d)).
Figure 11: AUC-curves of the different classification approaches
52
14.4 Basic impact approach 14.4.1 Model evaluation Table 16 gives an overview of the evaluation parameter (R squared for OLSR and AIC for NBR),
concerning the fit for all the different models. The scope of R square ranges between 1.20% and 2.74%
for the OLSR models. The variation explained in Y isn’t explained by the variation of the IV’s. At first
sight, the low values look very concerned. However, we have to take certain elements into account
concerning the low R square. In some fields, a lower R-squared value is more expected compared to
other fields, especially in the field of our study where we want to predict human behaviour (‘How to
Interpret a Regression Model with Low R-squared and Low P values’, 2014). Predicting whether a
person would like a post or not is more difficult than making predictions in the “pure science” field,
where predictions need to have a high degree of accuracy. BPP also depends on other factors (state of
mind, social pressure, personal interest etc.) than only the categorization type of the post, the number of
words and whether the post was in the weekend or not. We can still draw important conclusions since
some predictor values are statistically significant. We can explain how changes in the DV’s are
associated (positive, negative or not related) with BPP. It is not worth mentioning the precise prediction
effect of a specific message, since the spread of the data points around the predicted mean of the
regression line is quite high.
Table 16: Evaluation of the different approaches
Secondly, we looked at the different AIC’s value of the NBR models. Looking at the AIC value of one
model does not have much value on its own, compared to the R squared value of a linear regression
model. AIC is used to compare different models for the same data and the same DV. The model with
the lowest AIC, and so the lowest loss of information, is stated as the “best” model. Models with a higher
AIC parameter are perceived as more complex models (‘NEGATIVE BINOMIAL REGRESSION |
STATA DATA ANALYSIS EXAMPLES’, n.d.). We also added the Nagelkerke R squared parameter
as an extra check, which compares the likelihood of the full model compared to the likelihood of only
an intercept model (Mangiatfico, 2019). Based on the two parameters of fit for OLSR and NBR (and the
confirmation of the Nagelkerke parameter), the MAM scores the best, followed by the topic approach
model and the CA model. The AIC value of the reactions model is also higher compared to the other
OLSR OLSR OLSRadj. R^2 NR^2 AIC adj. R^2 NR^2 AIC adj. R^2 NR^2 AIC
2,41% 2,41% 1.324.565 2,74% 2,91% 569.160 2,51% 4,27% 459.845
Topic Model * 2,23% 2,19% 1.325.087 2,43% 2,15% 570.905 2,02% 3,19% 462.134Base Content Approach * 1,60% 0,71% 1.328.659 0,02% 1,67% 571.890 1,24% 2,02% 464.584Content Approach * 1,65% 0,84% 1.328.347 2,37% 2,15% 570.840 1,55% 2,30% 464.001Message Strategy Approach * 1,70% 0,51% 1.329.148 2,05% 1,53% 572.216 1,20% 1,26% 466.172Marketeer's Orientation Approach * 1,69% 0,54% 1.329.084 2,07% 1,62% 572.015 1,39% 1,50% 465.684
*Each approach also takes the 2 control variables into the model (number of words & weekendday)adj.R^2 for OLSR, AIC & Nagelkerke R^2 for NBR
NBR NBR NBR
Isolated models
Reactions Shares Comments
Multi Approaches model *
53
models (Table 17). Since we are dealing with low and small differentiations of R squared values, we
have to be careful by making exact predictions.
14.4.2 Estimation results The estimation results of the MAM are given in Table 17. The estimation results of the IM’s is given in
Table 18 through 22. As shown in the tables, all the 36 models are significant as a whole (p < 0.001). In
addition, different effects have been found for the categorization types with BPP. In what follows, we
will check the relationship of each categorization with the number of reactions, the number of shares
and the number of comments. Our main analysis is based on the model with all the categorization
variables taken into account. Still, we also look at how each isolated classification approach is
interrelated with BPP on its own.
Table 17: Estimation Results for Brand Post Popularity, Multi Approaches Model
OLSR NBR OLSR NBR OLSR NBR
(Intercept) 0,960 ** 1,715 ** 0,252 ** 0,168 ** 0,162 ** -0,958 **Topic approach Party -0,165 ** -0,018 -0,167 ** -0,634 ** -0,069 ** -0,028 Food -0,111 ** 0,174 ** -0,164 ** -0,753 ** -0,057 ** 0,086 * Performance 0,039 * 0,466 ** -0,130 ** -0,224 ** 0,050 ** 0,710 **Base Content approach Information 0,065 ** 0,087 ** -0,016 * -0,057 * 0,023 ** 0,081 ** Transaction/ Entertainment -0,023 ** 0,024 * -0,021 ** 0,095 ** 0,144 ** 0,267 **Approach 1: Content Information x x x x x x Entertainment xx 0,099 ** 0,034 ** 0,313 ** -0,015 ** 0,146 ** Transaction 0,109 ** 0,363 ** 0,125 ** 0,827 ** 0,144 ** 1,026 **Approach 2: Message Strategy Functional 0,613 * -0,255 0,586 ** 1,064 xx 0,011 Experiential xx 0,092 ** 0,041 ** 0,229 ** 0,030 ** 0,142 ** Emotional 0,597 ** 0,287 * xx -0,078 0,178 ** 0,061 Brand Resonance 0,304 ** 0,253 * xx -0,048 0,123 * 0,561 *Approach 3: Marketer's Orientation Task-oriented -0,028 * -0,096 ** -0,017 * -0,143 ** -0,042 ** -0,139 ** Relationship/ Interaction-oriented 0,137 ** 0,258 ** 0,131 ** 0,496 ** 0,222 ** 0,770 ** Self-oriented 0,233 ** 0,236 ** xx -0,164 0,068 ** 0,177Control variables Number of words *1 0,124 ** 0,001 ** 0,083 ** 0,004 0,045 ** 0,002 ** Weekend day 0,011 * -0,053 ** -0,061 ** -0,315 -0,015 ** -0,078 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 2,41% / 2,74% / 2,51% / AIC / 1.324.565 / 569.160 / 459.845
Shares (B)Reactions (B) Comments (B)Multi Approaches Model
*2: OLSR: F-statistic, NBR: Chi square
Unstandardized coefficients are reported in the table
xx Variable removed after stepwise variable selection
Dependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)
x Information is left out of the model to reduce overlap with the information variable from the base approach
54
Table 18: Estimation Results for Brand Post Popularity, Topic Approach
Table 19: Estimation Results for Brand Post Popularity, Base Content Approach
OLSR NBR OLSR NBR OLSR NBR
(Intercept) 0,966 ** 1,726 ** 0,246 ** 0,171 ** 0,166 ** -0,915 ** Party -0,152 ** 0,076 ** -0,184 ** -0,368 ** -0,050 ** 0,237 ** Food -0,090 ** 0,262 ** -0,198 ** -0,557 ** -0,025 ** 0,339 ** Performance 0,067 ** 0,563 ** -0,166 ** 0,024 0,081 ** 1,018 **Control variables Number of words *1 0,110 ** 0,001 ** 0,104 ** 0,009 ** 0,038 ** 0,005 ** Weekend day 0,012 * -0,057 ** -0,062 ** -0,329 ** -0,015 ** -0,107 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 2,23% / 2,43% / 2,02% / AIC / 1.325.087 / 570.905 / 462.134
*1: OLSR: number of words is replaced by log(number of words + 1)*2: OLSR: F-statistic, NBR: Chi square
Topic ApproachReactions (B) Shares (B) Comments (B)
Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)*p < 0,05, ** p < 0,001
OLSR NBR OLSR NBR OLSR NBR
(Intercept) 0,920 ** 1,834 ** 0,224 ** 0,046 ** 0,142 ** -0,822 ** Information 0,110 ** 0,086 ** -0,067 ** -0,335 ** 0,054 ** 0,020
Transaction/ Entertainment -0,028 ** 0,153 ** -0,035 ** 0,089 ** 0,025 ** 0,544 **Control variables Number of words *1 0,109 ** 0,003 ** 0,068 ** 0,008 ** 0,040 ** 0,008 ** Weekend day 0,009 * -0,055 ** -0,062 ** -0,322 ** -0,016 ** -0,087 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,60% / 2,31% / 1,24% / AIC / 1.328.659 / 571.890 / 464.584
*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)*2: OLSR: F-statistic, NBR: Chi square
Base Content ApproachReactions (B) Shares (B) Comments (B)
Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)
55
Table 20: Estimation Results for Brand Post Popularity, Content Approach
Table 21: Estimation Results for Brand Post Popularity, Message Strategy Approach
OLSR NBR OLSR NBR OLSR NBR
(Intercept) 0,923 ** 1,819 ** 0,219 ** 0,045 ** 0,152 ** -0,689 ** Information 0,058 ** 0,256 ** -0,074 ** -0,228 ** 0,036 ** 0,428 ** Entertainment -0,069 ** 0,153 ** 0,010 * 0,085 ** -0,059 ** 0,226 ** Transaction 0,099 ** 0,412 ** 0,129 ** 0,951 ** 0,156 ** 1,289 **Control variables Number of words *1 0,119 ** 0,003 ** 0,058 ** 0,006 ** 0,054 ** 0,006 ** Weekend day 0,011 * -0,052 ** -0,061 ** -0,302 ** -0,016 ** -0,061 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,65% / 2,37% / 1,55% / AIC / 1.328.347 / 570.840 / 464.001
*2: OLSR: F-statistic, NBR: Chi square
*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)
Content ApproachReactions (B) Shares (B) Comments (B)
Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)
OLSR NBR OLSR NBR OLSR NBR
(Intercept) 0,943 ** 1,902 ** 0,197 ** -0,039 ** 0,163 ** -0,576 ** Functional 0,719 * -0,007 0,643 ** 2,109 ** 0,253 * 0,427 Experiential -0,128 ** 0,092 ** 0,036 ** 0,105 ** -0,068 ** 0,184 ** Emotional 0,691 ** 0,506 ** 0,120 * 0,033 0,224 ** 0,241 Brand Resonance 0,486 ** 0,517 ** -0,072 -0,385 * 0,186 ** 0,789 **Control variables Number of words *1 0,134 ** 0,003 ** 0,055 ** 0,009 ** 0,059 ** 0,009 ** Weekend day 0,009 -0,056 ** -0,062 ** -0,322 ** -0,017 ** -0,094 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,70% / 2,05% / 1,20% / AIC / 1.329.148 / 572.216 / 466.172
*2: OLSR: F-statistic, NBR: Chi square
*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)
Message Strategy ApproachReactions (B) Shares (B) Comments (B)
Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)
56
Table 22: Estimation Results for Brand Post Popularity, Marketeer's Orientation Approach
An extra test was needed for the NBR model to check if each categorization variable is statistically
significant on its own. We could do this by comparing the MAM with and without the specific
categorization variable by conducting an ANOVA test. Previous research from Cvijikj & Michahelles
(2013) also made use of this extra test. The main difference between them and our model is, that overlap
is allowed within each approach over the categorization variables (e.g. a Facebook post can be classified
as information and transaction), while this is was allowed in the study from Cvijikj & Michahelles
(2013) (e.g. a message is classified as entertainment, information or remuneration as content type).
Conducting the ANOVA is less valuable in our study as we consider every categorization variable as a
factor variable on its own (0 or 1), while the variable content type from Cvijikj & Michahelles (2013)
was a factor variable with multiple possibilities. All categorization variables as well as the two control
variables were found to be a significant factor for all types of BPP (p < 0.0001). In what follows, the
effects of the explanatory variables are explained in relation to the number of reactions, shares and likes.
We will mention the results from the OLSR and the NBR model of the MAM and only mention the
results from the related isolated approach when results derive from the MAM. So not mentioning the
effects of the Isolated Approach, means the effect is confirmed by the specific IM approach on its own.
An overview of the estimated results can be found in table 18, 19, 20, 21 and 22.
14.4.3 Number of reactions Topic Approach
Party was found to be a significant factor for the OLSR model which has a negative relationship with
the number of reactions ()*+,-(/0/),34567 = -0.165, p < 0.001), while party was insignificantly related
with the number of reactions for the NBR model ()89-(/0/),34567 = -0.018, p > 0.05). This is in
OLSR NBR OLSR NBR OLSR NBR
(Intercept) 0,938 ** 1,913 ** 0,197 ** -0,026 * 0,160 ** -0,546 ** Task-oriented -0,122 ** 0,070 ** 0,037 ** 0,092 ** -0,077 ** 0,134 ** Relationship/ Interaction-oriented 0,126 ** 0,457 ** 0,174 ** 1,054 ** 0,256 ** 1,565 ** Self-oriented 0,376 ** 0,511 ** x -0,314 ** 0,128 ** 0,583 **Control variables Number of words *1 0,133 ** 0,003 ** 0,055 ** 0,008 ** 0,061 ** 0,008 ** Weekend day 0,010 -0,057 ** -0,062 ** -0,320 ** -0,017 ** -0,091 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,69% / 2,07% / 1,39% / AIC / 1.329.084 / 572.015 / 465.684
*2: OLSR: F-statistic, NBR: Chi square
*p < 0,05, ** p < 0,001*** OLSR: number of words is replaced by log(number of words + 1)x Variable removed after stepwise variable selection
Marketeer's Orientation ApproachReactions (B) Shares (B) Comments (B)
Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)
57
contrast with the isolated topic model where party was found as a significant and positive effect over
the number of reactions ()89-(:/),34567 = 0.076, p < 0.001). Providing content about food is significant
and negatively related to the number of reactions for the OLSR multi approach model ()*+,-(/0/),;<<=
= -0.111, p < 0.001), but the NBR model found a significant positive relationship between them
()89-(/0/),;<<= = 0.174, p < 0.001). The last topic (performance) was found to be significantly related
with the number of reactions ()*+,-(/0/),3>5;<5?4@A> = 0.039, p < 0.05; )89-(/0/),3>5;<5?4@A> =
0.466 p < 0.001). This effect is confirmed by the isolated topic model where the significance was even
stronger ()*+,-(:/),3>5;<5?4@A> = 0.067, p < 0.001).
Base Content Approach
If we look at the BCA, providing information is significant and positively related to the number of
reactions ()*+,-(/0/),B@;<5?46B<@ = 0.065 , p < 0.001;)89-(/0/),B@;<5?46B<@ = 0.087, p < 0.001). The
estimated results for OLSR and NBR are in contradiction to each other. While messages about
transaction/ entertainment are significant and negatively related to the number of reactions for the OLSR
method ()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = -0.023, p < 0.001), the NBR model found a
marginally positive association between transaction/ entertainment and the number of reactions
()89-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = 0.024, p < 0.05). This marginal effect was confirmed by the
isolated NBR model with even a lower p-value ()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = 0.153, p <
0.001).
Content Approach
The CA splits the transaction/ entertainment variable from the BCA into two separated variables. As
mentioned before, to restrict against overlap between categorization variables, we left information of
the CA out of the MAM. However, we still looked at how informative posts behave in the isolated
content model. A positive relationship was found between informative messages and the number of
reactions ()*+,-(:/),B@;<5?46B<@ = 0.058 p < 0.001; )89-(:/),B@;<5?46B<@ = 0.256, p < 0.001).
Entertainment was proven to be insignificant for the OLSR model and was removed out of the model,
after stepwise selection of the variables of the model. In contrast, entertainment was found to be
significant and positively related to the number of reactions ()89-(/0/),>@6>564B@?>@6 = 0.099, p <
0.001). If we look at the isolated model, entertainment was found to be significant and negatively
associated with the number of reactions based on the OLSR method ()89-(:/),>@6>564B@?>@6 = 0.069, p
< 0.001). A significant positive association was found between the transaction and the number of
reactions. ()*+,-(/0/),654@D4A6B<@= 0.109, p < 0.001; )89-(/0/),654@D4A6B<@ = 0.363 p < 0,001).
Message Strategy Approach
Posts providing functional content are marginally significant and positively related with the number of
reactions based on the OSLR method ()*+,-(/0/),;E@A6B<@4F = 0.613, p < 0.05). According to the NBR
58
method, no relationship was found between functional and the number of reactions
()89-(/0/),;E@A6B<@4F = -0.255, p > 0.05). While posts who stimulate behavioural responses
(experiential) were left out of the OLSR model after stepwise variable selection, the NBR model found
a significant and positive relation between experiential and the number of reactions
()89-(/0/),>G3>5B>@6B4F = 0.092, p < 0.001). In contrast with the MAM where experiential was left out
of the model, the isolated model found a significant negative association with the number of reactions.
()*+,-(:/),>G3>5B>@6B4F = -0.128, p < 0.001). Posts who are emotionally related were found positively
related with the number of reactions ()*+,-(/0/),>?<6B<@4F = 0.597 , p < 0.001; )89-(/0/),>?<6B<@4F =
0.287 p < 0.05). The same results can be found for the relationship between brand resonance and the
number of reactions ()*+,-(/0/),H54@=5>D<@4@A> = 0.304 , p < 0.001; )89-(/0/),H54@=5>D<@4@A> =
0.253 p < 0.05). The isolated model confirmed the effect of emotional and brand resonance on the
number of reactions but the unstandardized values were also significant with a p-value lower than 0.001
for the NBR model. ()89-(:/),>?<6B<@4F = 0.506, p < 0,001; )89-(:/),H54@=5>D<@4A> = 0.517, p < 0.001).
Marketeer’s Orientation Approach
Relationship/ interaction-oriented was found significant and positively related to the number of reactions
()*+,-(/0/),5>F46B<@DIB3 = 0.137, p < 0,001; )89-(/0/),5>F46B<@DIB3 = 0.258, p < 0.001). The same
results were found for posts who are self-oriented ()*+,-(/0/),D>F; = 0.233, p < 0.001; )89-(/0/),D>F;
= 0.0.236, p < 0.001). Different results were found for posts who are task-oriented (e.g. advertising,
coupons, discounts etc.). Task-oriented was found marginally significant and negatively associated with
the number of reactions. ()*+,-(/0/),64DJ = -0.028, p < 0.05). The isolated OLSR approach confirmed
this effect on a lower significance degree ()*+,-(:/),D>F; = -0.122, p < 0,001). While the NBR (MAM)
model also found a significant and negative association between task-oriented posts and the number of
reactions ()89-(/0/),64DJ = -0.096, p < 0.001), the isolated NBR model found a significant and positive
relation between task-oriented and the number of reactions . ()89-(:/),64DJ = 0.070, p < 0.001).
Control variables
In terms of the control variables, the number of words was found as a significant and positively related
factor to the number of reactions ()89-(/0/),@E?H>5<;=47D = 0.124, p < 0.001;
)89-(/0/),@E?H>5<;=47D = 0.001, p < 0.001). A Facebook posted on Saturday or Sunday was found
significant and negatively related to the number of reactions for the NBR model ()89-(/0/),K>>J>@==47
= -0.053, p < 0.001). However, the OLSR model found a marginally significant positive effect of
weekend days on the number of reactions ()*+,-(/0/),K>>J>@==47 = 0.011, p < 0.05). These effects
were confirmed by the isolated approaches. Only for the MSA and the marketeer’s orientation approach
based on the OLSR model, no significant effect was found between weekend day and the number of
reactions ()*+,-(:/),K>>J>@==47 = 0.009, p > 0.05; ()89-(:/),K>>J>@==47 = 0.010, p > 0.05).
59
14.4.4 Number of shares Topic Approach
Messages who were assigned to one of the 3 topics (party, food & performance) are significant and
negatively related to the number of shares. The respectively unstandardized coefficients for party, food
and performance are all significant with a p-value lower than 0.001 for the OLSR model as well as the
NBR model. Only content about music & concerts (performance topic) was insignificant with the
number of shares for the isolated model which made use of NBR ()*+,-(:/),3>5;<5?4@A> = 0.024, p <
0.001).
Base Content Approach
Information is negative and marginally significantly related to the number of shares
()*+,-(/0/),B@;<5?46B<@ = -0.016 , p < 0.05; )89-(/0/),B@;<5?46B<@ = -0.057 p < 0.05). This effect is
confirmed and reinforced if we look at the isolated BCA ()*+,-(:/),B@;<5?46B<@ = -0.067 , p < 0.001;
)89-(:/),B@;<5?46B<@ = -0.335 p < 0.001). The outcomes for the 2 models (OLSR & NBR) are conflicting
each other. While OLSR found a significant negative association between transaction/ entertainment &
the number of shares for a post, NBR discovered significant positive effect with the number of shares
for a transaction/ entertainment message ()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = -0.021 , p < 0.001;
)89-(/0/),654@D4A6B<@&>@6>564B@?>@6 = -0.095 p < 0.001).
Content Approach
Whether a BP is entertainment related has a significant positive influence on the number of shares
()*+,-(/0/),>@6>564B@?>@6 = 0.034 , p < 0.001; )89-(/0/),>@6>564B@?>@6 = 0.313, p < 0.001). The same
conclusion could be made for transaction oriented messages ()*+,-(/0/),654@D4A6B<@ = 0.125, p < 0.001;
)89-(/0/),654@D4A6B<@ = 0.827, p < 0.001). If we look at the isolated content model which only takes
information, entertainment and transaction into the model as well as the two control variables (number
of words & weekend day), information posts were found significant negatively related to the number of
shares ()*+,-(:/),B@;<5?46B<@ = -0.074 , p < 0.001; )89-(:/),B@;<5?46B<@ = - 0.228, p < 0.001). Secondly,
the entertainment character was also found positively related to the number of shares (OSLR model) but
on a marginally significance level ()*+,-(:/),>@6>564B@?>@6 = 0.010, p < 0.05).
Message Strategy Approach
Messages concerning functional content are significant and positively related to the number of shares as
for the OLSR method ()*+,-(/0/),;E@A6B<@4F = 0.586, p < 0.001), while no significant association was
found for the NBR model ()89-(/0/),;E@A6B<@4F = 1.064, p > 0.05). In contrast, the isolated MSA which
made use of NBR found a significant and positive relation between the functional and the number of
shares ()89-(:/),;E@A6B<@4F = 2.109, p < 0.001). Experiential was found to be a significant positive factor
for the number of shares ()89-(/0/),>G3>5B>@6B4F = 0.041, p < 0.001;()89-(/0/),>G3>5B>@6B4F = 0.229,
60
p < 0.001). Posts providing emotional and brand resonance related content where left out of the OLSR
(MAM) model after stepwise selection of the variables. This is in line with the results from the NBR
(MAM) model where emotional and brand resonance were found insignificant. ()89-(/0/),>?<6B<@4F =
-0.078, p > 0.05; )89-(/0/),H54@=5>D<@4@A> = -0.048 p < 0.05). If we look at the isolated model of the
MSA, emotional BP’s were found marginally significant positively related to the number of shares
concerning the OLSR model ()*+,-(:/),>?<6B<@4F = 0.120, p < 0.05), while the NBR method found no
significant relationship between them ()89-(:/),>?<6B<@4F = 0.033, p > 0.05). Alternatively, a message
posting about the brand images and histories was found marginally significant and negatively related to
the number of shares concerning the NBR method()89-(:/),H54@=5>D<@4@A> = -0.385, p < 0.05), while
the OLSR method found no significant association between brand resonance and the number of shares
()*+,-(:/),H54@=5>D<@4@A> = -0.072, p > 0.05).
Marketeer’s Orientation Approach
Messages who provide content with the focus on increasing the interactivity between the followers and
the brand page were found significant and positively associated with the number of shares
()*+,-(/0/),5>F46B@DIB3 = -0.078, p < 0.001; )89-(/0/),5>F46B<@DIB3 = -0.048 p < 0.001). Self-oriented
posts were removed out of the MAM and isolated model after stepwise reduction for OLSR. But if we
look at the NBR model, no significant relation was found between self-oriented and the number of shares
for the MAM model ()89-(/0/),D>F; = -0.164, p > 0.05), while the isolated model found a significant
and negative relation with the number of shares ()89-(:/),D>F; = -0.314, p < 0.001). Task-oriented is
found marginally and significantly related to the number of shares concerning the OLSR model
()*+,-(/0/),64DJ = -0.017, p < 0.05). The NBR model confirmed this effect with even a lower
signification level ()89-(/0/),64DJ = -0.143, p < 0.001). However, the isolated model found a
significant and positive association with posts providing task-oriented content and the number of shares
()*+,-(:/),64DJ = 0.037, p < 0.001; )89-(:/),64DJ = 0.092 p < 0.001).
Control variables
Posting messages during the weekend was found as a significant and negatively related factor to the
number of shares ()*+,-(/0/),K>>J>@==47 = 0.083, p < 0.001; )89-(/0/),K>>J>@==47 = 0.004 p <
0.001). On the opposite, the number of words of a message was found significant and positive associated
with the number of shares ()*+,-(/0/),@E?H>5<;K<5=D = -0.061, p < 0.001; )89-(/0/),@E?H>5<;K<5=D
= -0.315 p < 0.001).
14.4.5 Number of comments Topic Approach
Party related messages are significant and negatively related to the number of comments based on the
OLSR model ()*+,-(/0/),34567 = 0.069, p < 0.001) while party is insignificant associated with
61
comments according to the NBR method ()89-(/0/),34567 = -0.028, p > 0.05). However, party was
found significantly positively related to the number of comments according to isolated NBR model
()89-(:/),34567 = 0,237, p < 0,001), while the OLSR isolated model confirmed the significant negative
relationship of the results from OLSR with the number of comments ()*+,-(:/),34567 = -0.05, p < 0.001).
Food is significantly and negatively related to the number of comments ()*+,-(/0/),;<<= = -0.057, p
< 0.001) according to the OLSR model, while food is marginally positive associated with the number of
comments based on the NBR model ()89-(/0/),;<<= = 0.086, p < 0.05). The isolated model confirmed
this positive relationship between food and comment with even a lower p-value ()89-(:/),;<<= = 0.339
p < 0.001). Providing posts related to performance was significant positive associated with the number
of comments on the message for the OLSR model as well as the NBR model ()*+,-(/0/),3>5;<5?4@A>
= 0,050, p < 0,001; )89-(/0/),3>5;<5?4@A> = 0.710, p < 0.001).
Base Content Approach
Providing information related content in Facebook posts is significant and positively related to the
number of comments ()*+,-(/0/),B@;<5?46B<@ = 0.023, p < 0.001; )89-(/0/),B@;<5?46B<@ = 0.081, p <
0.001). The same estimated results were found if we look at transaction/ entertainment posts
()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = 0.144, p < 0.001; )89-(/0/),654@D4A6B<@<5>@6>564B@?>@6 =
0.267, p < 0.001). Compared to the isolated BCA, one difference was found. No relationship was found
between transaction/ entertainment and the number of comments according the NBR model
()89-(:/),654@D4A6B<@<5>@6>564B@?>@6 = 0.020, p > 0.05).
Content Approach
Information is significant and positively related to the number of comments if we look at the isolated
model. )89-(:/),B@;<5?46B<@ = 0.036, p < 0.001;)*+,-(:/),B@;<5?46B<@ = 0.428, p < 0.001). The results
from the OLSR method and the NBR for entertainment posts contradict each other. While the OLSR
method found a significant negative relation with the number of comments ()*+,-(/0/),>@6>564B@?>@6
= -0.015, p < 0.001), the NBR model found a positive relation between entertainment and the number
of comments ()89-(/0/),>@6>564B@?>@6 = 0.146, p < 0.001). Posts talking about sweepstakes, bonuses,
promotions etc. (transaction content) are significant positive associated with the number of comments
()*+,-(/0/),654@D4A6B<@ = 0.144, p < 0.001; )89-(/0/),654@D4A6B<@ = 1.026, p < 0.001).
Message Strategy Approach
Functional was left out of the OLSR model after stepwise variable selection. In line, no significant
association was found with functional and the number of comments based on the NBR method
()89-(/0/),;E@A6B<@4F = 0.011, p > 0.05). Still, the isolated model found a marginally significant
positive association (OLSR procedure) between functional and the number of comments
()*+,-(:/),;E@A6B<@4F = 0.253, p < 0.05). BP’s concerning sensory stimulation, physical stimulation or
62
brand events were found significant and positively related to the number of comments
()*+,-(/0/),>G3>5B>@6B4F = 0.146, p < 0.001;)89-(/0/),>G3>5B>@6B4F = 0.142, p < 0.001). In contrast, the
isolated OLSR model for MSA found a significant and negative association between experiential and
the number of comments ()*+,-(:/),>G3>5B>@6B4F = -0.068, p < 0.001). Emotion-laden message are
significant and positively related to the number of comments concerning the OLSR model
()*+,-(/0/),>?<6B<@4F = 0.178, p < 0.001), while the NBR model found no significant relation
()89-(/0/),>?<6B<@4F = 0.061, p > 0.05). Posts providing brand resonance related content are marginally
significant and positively linked with the number of comments ()*+,-(/0/),H54@=5>D<@4@A> = 0.123, p
< 0.05; )89-(/0/),H54@=5>D<@4@A> = 0.561, p < 0.05). The isolated model confirmed the positive effect
between brand resonance and the number of comments on a lower signification level
()*+,-(:/),H54@=5>D<@4@A> = 0.186, p < 0.001; )89-(:/),H54@=5>D<@4@A> = 0.789, p < 0.001).
Marketeer’s Orientation Approach
A negative and significant association was found between task-oriented and the number of comments
()*+,-(/0/),64DJ = -0.042, p < 0.001; )89-(/0/),64DJ = -0.139, p < 0.001). contrary, a significant
positive relation was found with task-oriented messages and the number of comments for the NBR
model ()89-(:/),64DJ = 0.134, p < 0.001). Relationship/ interaction-oriented was found significant and
positively related to the number of comments ()*+,-(/0/),5>F46B<@DIB3 = 0.222, p < 0.001;
)89-(/0/),5>F46B<@DIB3 = 0.770, p < 0.001). Where the OLSR model found a significant and positively
association between self-oriented and the number of comments ()*+,-(:/),D>F; = 0.068, p < 0.001), no
significant relation was found between them concerning the NBR model ()89-(/0/),D>F; = 0.177, p <
0.001). In contrast to insignificant relation which was found for the MAM (NBR) model, the isolated
NBR model found a positive and significantly relation between self-oriented posts and the number of
comments ()89-(:/),D>F; = 0.583, p < 0.001)
Control variables
If we look at how the control variables behave in relation to the number of comments, we found the
same results as for the number of shares. The number of words are significant and positively associated
to the number of comments ()*+,-(/0/),@E?H>5<;K<5=D = 0.045, p < 0.001;
)89-(/0/),@E?H>5<;K<5=D = 0.002, p < 0.001), while a post who is created during the week is
significant and negatively related to the number of comments ()*+,-(/0/),K>>J>@==47 = -0.015, p <
0.001; )89-(/0/),K>>J>@==47 = -0.078, p < 0.001).
63
15 Discussion and managerial implications 15.1 Selection of Classification approach This section gives a more comprehensive discussion on the evaluation results of the performance of the
topic model and the supervised models. Our first content model made use of the unsupervised topic
modelling approach. Our three obtained topics from the models made sense (party, food &
performance). So overall, we can conclude topic modelling is a well-established classification approach
on its own. Topic model highly dependent on the characteristics of the data. Applying the approach to
another industry, will result in other overarching topics. For this reason is it also difficult to compare the
approach with the results of the supervised approaches. Marketeers can use this approach to see which
themes and topics are mainly dominated on their brand page.
If we look at the evaluation of the supervised approaches we can make the following conclusions. First
of all, splitting the transaction/entertainment variable from the Base Content Approach into transaction
and entertainment increased the classification performance of the two variables. Managers who want to
take the content classification into account, should make use of the CA instead of the BCA. Secondly,
the functional value had a bad prediction accuracy compared to the other variables of the Message
Strategy Approach. This is mainly due to the fact that our data is coming from the bar industry and
functional posts appear less in this environment. This is also why the experiential categorization was
dominated by the MSA. We would recommend managers to use this approach if the characteristics of
the industry they are working in permit it (e.g. a manufacturing company is more applicable to use this
categorization since post mentioning functional claims are more commonly used than in the food or
drink industry). The last approach (MAO) takes more the mindset and strategy of the marketeers as a
starting point, focusing less on the content, while the MSA fits closer to the Content Approach since it
focuses on the strategy of the messages itself. Managers who want to classify their message posts on a
higher strategy, lower content level, can make use of the effective MOA. Overall, all of the 4 approaches
are well practicable in automatic content classification. The preferable approach to apply in practice can
be summarized by the following two factors.
(1) Industry: The best suitable approach mainly depends on the industry of the company. In our case,
it is clear that more informal categorization variables (e.g. entertainment & experiential) perform better
to more formal classification types (e.g. functional), due to the characteristics of bars & restaurants.
Posts are more informal to their customers compared to Volvo who would post about a new feature
online. (2) Preferred viewpoint: Do the managers & marketeers want to classify posts by taking the
content as a starting point (CA or BCA) or do they prefer to classify posts by taking their own perceived
scheme of online engagement with customers as a starting point (MOA)? A third possible approach
takes the strategy of different messages as a starting point (MSA).
64
15.2 Enhancing Brand post popularity The results of the predictive impact analysis have shown that not all categorization types have a
significant effect on the number of reactions, shares or comments. Secondly, the variables from the
different classification approaches have shown different effects on BPP. In what follows, we give a more
practical explanation on how marketeers or managers can use this research as a guideline for increasing
online engagement. Exhibit 6 explains the method we used to decide which variables are worth
mentioning in this section. This is due to the fact that some estimated results were in contradiction to
each other (e.g. the topic was found negative related to the number of reactions for the OLSR model,
while the NBR model found a positive association between the topic and the number of reactions). A
remark to keep in mind is that our results are derived from the bar and restaurant industry. Possible
derivations of relationship between categorization variables and BBP can occur over different sectors.
Secondly the low model evaluation parameter R squared made it not possible to make exact predictions.
15.2.1 Enhancing the number of reactions Managers who want to enhance BPP in the form of increasing the number of reactions (likes), should
post content about concerts & music (performance topic). This can be due to the fact that people feel
happier when one of their favourite bars organizes a concert they do not want to miss. Moderators
focusing on the content of the messages, should post information or transaction related content to
increase the number of reactions. A possible explanation for this positive association between
information and the number of reactions, is that followers are satisfied with the extra information the
bar or restaurant is posting about (e.g. information about the opening hours). Secondly, when marketeers
post transaction related content (sweepstakes, deals, bonuses, discounts etc.), they sometimes force
followers to like the post in order to make use of the promotion mentioned. Managers who focus on the
MSA for content classification, should post messages who are emotional & brand resonance related to
increase the number of reactions. Probably, customers feel more connected to emotional laden posts,
which increase the probability that they will like the post. Additionally, to increase the number of
reactions, marketeers should post self-oriented or relationship/interaction-oriented content. If we look
at interaction, the result is quite intuitive since these posts contain votes and contests or ask for feedback
(e.g. you have to like the post to enter the contest). It is beneficial for the number of reactions, to post
during the week. This might be due to the fact that people are more busy during the weekends with a
fully booked schedule, while they probably have more “me time” during the week after a long day of
work to check SM. Our research further indicates that longer messages have a positive impact on the
number of reactions.
65
15.2.2 Enhancing the number of shares Managers who want to enhance online engagement by increasing the number of shares should not focus
on the topic of the messages. Party, food and performance are negative related to the number of shares.
This might be explained by the fact that topics are formed by an unsupervised approach which focuses
on forming overarching subjects of the messages. So, these topics do not focus on pre-formed
categorization variables which are sometimes more outcome-oriented and less content-oriented (e.g. the
goal of the relationship/interaction-oriented variable from the MOA is to increase the interaction, which
translates itself automatically in increasing BPP). Compared to the positive effect of information with
the number of reactions and comments, it has a negative effect on the number of shares. From personal
experience, if you want to show an interesting information post to one of your friends, you will prefer
to tag him in the comments over sharing it to all of your followers on your personal page. Furthermore,
managers should post entertainment and transaction related content to increase the number of shares. If
we look at the MSA, emotional and brand resonance are unrelated to the number of shares while the two
variables were positive related to the number of reactions. A possible explanation could be that posts
who are more informal (emotional & brand resonance) trigger more likes compared to shares since
customers feel more personally connected to these kind of posts (e.g. a follower will feel more plausible
to like a post over an emotional goodbye of one of the bartenders compared to the functional explanation
of a product). On the other hand, functional and experiential are positive related to the number of shares.
Probably, experiential posts are more shared since they want to encourage physical & sensory
stimulations. In the bar industry, these types of posts are most of the time combined with a little contest
or sweepstake which stimulates the people to share a post and to win a price. A post focusing on this
relationship/interaction-oriented content will enhance the number of shares as well. The same
suggestions can be made for the control variables as for the number of reactions. Moderators or managers
should not post on a Saturday or a Sunday to increase the number of shares. Secondly, posting longer
messages will also enhance the number shares.
15.2.3 Enhancing the number of comments If we look at the relation between the topics and the number of comments, the same conclusion can be
made as for the number of reactions. Managers who want to increase the number of comments by making
use of one of the topics, should post about concerts & gigs. Furthermore, information and
transaction/entertainment have a positive effect on the number of comments as well as only transaction
related content. The effect of entertainment on the number of comments stays unclear, due the method
used in Exhibit 6. The fact that followers are likely to comment on transaction posts can be due to the
fact that they are tagging their friends to let them know the promotions of the bar. Furthermore, rand
resonance has a positive association with the number of comments. If we look at our dataset, most of
the brand resonance posts consisted out of wishing internal staff a happy birthday. So a possible
explanation could be that customers of the bar are also wishing the member of the staff a happy birthday
66
by commenting on the post. Additionally, no effect was found between functional and the number of
comments. Moderators can also post relationship/interaction-oriented and self-oriented messages to
increase the number of comments if they focus on the MOA (the same results were found for the number
of reactions). Lastly, as in line with the results found between weekend day and the number of reactions
and shares, marketeers should post during the week and longer messages to increase the number of
comments. Probably, longer messages are richer in form of provided information so people will feel
more attached to it.
67
16 Summary This work empirically tries to contribute to a better understanding of how Facebook messages can be
categorized and creates opportunities for managers to stimulate online engagement. More specifically,
we have looked at how (content) classification approaches can be used for the classification of Social
Media posts. Secondly, we have looked at which approach is best suitable for automatic content
classification. The performance was measured through Random Forest for the supervised approaches,
while a Topic modelling was used for an unsupervised classification approach. Finally, we have looked
at which categorization factors influence brand post popularity, measured by the number of reactions,
shares and comments. Our focus was on the bar/restaurant industry. The classification approaches
mentioned hereafter were found based on previous literature. For researchers the proposed classification
frameworks offer a starting point to categorize SM messages. It can also help managers to take a closer
look at which classification approach their strategy is best aligned with.
(0) Base Content Approach (information & entertainment/transaction)
(1) Content Approach (information, entertainment & transaction)
(2) Message Strategy Approach (functional, experiential, emotional & brand resonance)
(3) Marketeer’s Orientation Approach (task-oriented, relationship/interaction-oriented & self-
oriented)
(4) Viral Marketing Rules Approach (promotion, product, entertainment & event)
Which automatic content classification approach a manager should take into account mainly dependent
on the industry and the preferred viewpoint. The Message Strategy Approach is less suitable for the bar
industry compared to the (Base) Content Approach and the Marketeer’s Orientation Approach due to
the functional category. Results showed that marketeers who want to take the content viewpoint as a
starting point for message classification should make use of the Content Approach and not the Base
Content Approach, since the classification performance of the entertainment/transaction variable
increased by splitting them into two separate categories. Furthermore, we do not recommend to take the
Viral Marketing Rules Approach into account for managers who want to compare or take all the different
classification approaches into consideration, due to the overlap with the other more distinctive
approaches. Finally, marketeers who want to know the underlying themes of their brand page posts can
make use of topic modelling.
Moreover, this paper analysed the characteristics of the different classification approaches that might
influence online customer engagement measured by brand post popularity (number of likes, comments
& shares) Our results showed different relationships between the categorization variables of the different
approaches with the number of reactions, comments & shares. Transaction-oriented content (Content
Approach) or relationship/interaction-oriented content (Marketeer’s Orientation Approach) showed a
positive relationship with brand post popularity, probably due to the fact that these categorization types
68
are more encouraging people to interact online. Furthermore, information (Base Content Approach &
Content Approach) and performance (Topic Approach) showed a positive relation with the number of
reactions and comments, but a negative association with the number of shares. Posts who are more
company-oriented (self-oriented from the Marketer’s Orientation Approach and brand resonance from
the Message Strategy Approach) are positively related to the number of reactions and comments. Finally,
posts created during week days increase the level of brand post popularity, while longer posts
(containing more words in the message) will decrease the level of engagement. These findings should
encourage marketeers and managers to create a better insight on the (automatic-) content classification
of Facebook messages. Furthermore, these findings should also stimulate them to research the existence
of categorization types, originating from other classification approaches companies are currently using,
that influence online engagement with the customers.
69
17 Limitations and further research This research consists of some limitations which can be analysed in further research. First of all, our
data is coming from bar and restaurant companies. As mentioned before, some approaches and variable
categories are obviously more useful in manufacturing or transport industry in comparison to the bar
and restaurant industry. Further research could check more into detail and give insights on how each
approach applies for different industries.
Secondly, only one person classified 1000 labelled posts through human coding. As mentioned in the
human coding classification section, labelling posts into the specific category type is quite subjective.
Previous research has made use of workers of Amazon Mechanical Turk (AMT) to classify different
posts. AMT is a crowd sourcing marketplace for simple tasks. It enables the use of human intelligence
to perform tasks that are unable to be executed by computers (Lee et al., 2018; Stephen et al., 2015).
Since this is costly and we would depend on others, we did not make use of this technique. Still, to
improve the classification of our different approaches, further research could make use of this technique
or classify a bigger labelled data set conducted by more people to test the inter-consistency (Ashley &
Tuten, 2015)
Thirdly, our research focused on online engagement of brand pages by analysing the influence of the
different approaches on BPP. Still, little research has analysed the effect of what content drives
sales/profitability or consumer purchases. The study from Goh et al. (2013) analysed the impact of
content on customers repeated purchase behaviour while the study from Rishika et al. (2013) analysed
the impact on profitability. Further research could analyse the effect of our proposed approaches on the
sales of the bars and restaurants.
Furthermore, our model was restricted to the number of valuable IV’s. Only the classification variables
of the different approaches with two extra control variables were taken into our model. As mentioned
before, it is obvious that the behaviour of a person, i.e. if he would like, comment or share a BP, depends
on more factors (e.g. social contagion, mood of the person or even the weather outside). Additionally,
posts of our data containing empty messages were perceived as “empty” content, this is probably due to
the fact that these posts were sharing media related content (e.g. video’s, links, photos, images etc.). But
since no data was available on the media types used for the BP’s, we could not include the media type
element into our models. Secondly, we did not adjust our DV on the number of followers of the specific
BP’s. The more people who are following you, the higher the possibility to have more likes or comments
(Cvijikj & Michahelles, 2013). Emoticons used in the content message were removed out of the post.
Further analysis could take emoticons into account on analysing the effects of messages on online
engagement. We are convinced that a richer dataset, taking more outside factors into account, will
increase the performance of our models.
70
Finally, our research is limited to content coming from Facebook. Over the last year, other platforms
have gained importance for online engagement between companies and their customers. For the
moment, Instagram is “the place to be” when it comes to social engagement. Further research could
check how the different classification approaches behave on Instagram, in order to allow companies to
make efficient use of all their SM platforms.
Although our research has some limitations, we are convinced that this study is valuable to the literature
of content classification of SM messages. Our study can be used as a guideline for further research that
involves Social Media content classification.
VIII
References 2,7 miljoen Europeanen zijn getroffen door privacyschandaal Facebook. (2016, April 6). HLN.
Retrieved from https://www.hln.be/nieuws/buitenland/2-7-miljoen-europeanen-zijn-getroffen-door-privacyschandaal-facebook~a6dfde33/
A gentle introduction to topic modeling using R. (2015). Retrieved 15 February 2019, from Eight to Late website: https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/
Arnold, T. W. (2010). Uninformative Parameters and Model Selection Using Akaike’s Information Criterion. Journal of Wildlife Management, 74(6), 1175–1178. Retrieved from http://www.bioone.org/doi/abs/10.2193/2009-367
Ashley, C., & Tuten, T. (2015). Creative Strategies in Social Media Marketing: An Exploratory Study of Branded Social Content and Consumer Engagement. Psychology and Marketing, 32(1), 15–27.
Berger, J., & Milkman, K. L. (2012). What Makes Online Content Viral? Journal of Marketing Research, XLIX, 192–205. Retrieved from www.marketingpower.com/jmr_
Berk, R. A. (2007). Random Forests. In Statistical Learning from a Regression Perspective (second edi, pp. 205–258). Retrieved from http://www.springer.com/series/417
Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. New York: Cambridge University Press.
Chen, J. (2019). Business to Business (B2B). Retrieved from Investopedia website: https://www.investopedia.com/terms/b/btob.asp
Cvijikj, I. P., & Michahelles, F. (2013). Online engagement factors on Facebook brand pages. Social Network Analysis and Mining, 3(4), 843–861.
Davis, R., Piven, I., & Breazeale, M. (2014). Conceptualizing the brand in social media community: The five sources model. Journal of Retailing and Consumer Services, 21, 468–481.
De Pelsmacker, P., & Van Kenhove, P. (2007). Marktonderzoek: methoden en toepassingen (2nd ed.). Pearson Education Benelux.
de Vries, L., Gensler, S., & Leeflang, P. S. H. (2012). Popularity of Brand Posts on Brand Fan Pages: An Investigation of the Effects of Social Media Marketing. Journal of Interactive Marketing, 26, 83–91.
De Vries, N. J., & Carlson, J. (2014). Examining the drivers and brand performance implications of customer engagement with brands in the social media environment. Journal of Brand Management, 21(6), 495–515.
Definition of entertainment. (n.d.). Retrieved 15 December 2018, from Oxford dictionaries website: https://en.oxforddictionaries.com/definition/entertainment
Donges, N. (2018). The Random Forest Algorithm. Medium. Retrieved from https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd
Feinerer, I. (2018). Introduction to the tm Package Text Mining in R. Retrieved from https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf
Gareth, J., Witten, D., Hastie, T., & Tibshirani, R. (2017). An Introduction to Statistical Learning with Applications in R. In A modern approach to regression with R. Retrieved from http://books.google.com/books?id=9tv0taI8l6YC
IX
Global social network penetration rate as of January 2019, by region. (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/269615/social-network-penetration-by-region/
Goh, K.-Y., Heng, C.-S., & Lin, Z. (2013). Social media brand community and consumer behavior: Quantifying the relative impact of user- and marketer-generated content. Information Systems Research, 24(1), 88–107.
Gummerus, J., Liljander, V., Weman, E., & Pihlström, M. (2012). Customer engagement in a Facebook brand community. Management Research Review, 35(9), 857–877.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Diagnostic Radiology, 143(1), 29–36. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7063747
Hopkins, D. J., & King, G. (2010). A Method of Automated Nonparametric Content Analysis for Social Science. In American Journal of Political Science (Vol. 54).
How to Interpret a Regression Model with Low R-squared and Low P values. (2014). Retrieved 2 May 2019, from The Minitab Blog website: https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values
Jahn, B., & Kunz, W. (2012). How to transform consumers into fans of your brand. Journal of Service Management, 23(3), 344–361.
Kim, D. H., Spiller, L., & Hettche, M. (2015). Analyzing media types and content orientations in Facebook for global brands. Journal of Research in Interactive Marketing, 9(1), 4–30.
Kiráľová, A., & Pavlíčeka, A. (2015). Development of Social Media Strategies in Tourism Destination. Procedia - Social and Behavioral Sciences, 175, 358–366.
Klema, V. C., & Laub, A. J. (1980). The Singular Value Decomposition: Its Computation and Some Applications. Transactions on Automatic Control, 25(2), 164–176.
Kremers, B. (n.d.). Electronic Word Of Mouth presents a window of opportunity for businesses. Retrieved 11 October 2018, from BuzzTALK website: https://www.buzztalkmonitor.com/blog/electronic-word-of-mouth-presents-a-window-of-opportunity-for-businesses/
Krug, S. (2016). Reactions Now Available Globally. Newsroom Facebook. Retrieved from https://newsroom.fb.com/news/2016/02/reactions-now-available-globally/
Larivière, B., & Van Den Poel, D. (2005). Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Systems with Applications, 29, 472–484.
Lee, D., Hosanagar, K., & Nair, H. S. (2018). Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook. Management Science, 64(11), 5105–5131.
Liaw, A., & Wiener, M. (2002). Classification and Regression by RandomForest. ResearchGate, 2, 18–22. Retrieved from https://www.researchgate.net/publication/228451484
Liu, Y., & Shrum, L. J. (2002). What Is Interactivity and Is It Always Such a Good Thing? Implications of Definition, Person, and Situation for the Influence of Interactivity on Advertising Effectivness. Journal of Advertising, 31(4), 53–64.
Mangiatfico, S. (2019). Functions to Support Extension Education Program Evaluation. Retrieved from http://rcompanion.org
Market-Revenue Per Internet User. (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/outlook/220/100/social-media-advertising/worldwide#market-revenuePerInternetUser
X
Meire, M., Ballings, M., & Van den Poel, D. (2016). The added value of auxiliary data in sentiment analysis of Facebook posts. Decision Support Systems, 89, 98–112.
Most popular social networks worldwide as of April 2019, ranked by number of active users (in millions). (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/
Murzintcev, N. (2019). Tuning of the Latent Dirichlet Allocation Models Parameters Description Estimates the best fitting number of topics. Retrieved from https://cran.r-project.org/web/packages/ldatuning/ldatuning.pdf
NEGATIVE BINOMIAL REGRESSION | R DATA ANALYSIS EXAMPLES. (n.d.). Retrieved 3 April 2019, from UCLA: Institute for Digital Research and Education website: https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/
NEGATIVE BINOMIAL REGRESSION | STATA DATA ANALYSIS EXAMPLES. (n.d.). Retrieved 3 April 2019, from UCLA Institute for Digital Research and Education website: https://stats.idre.ucla.edu/stata/dae/negative-binomial-regression/
Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine Your Own Business: Market-Structure Surveillance Through Text Mining. In Marketing Science (Vol. 31).
Number of monthly active Facebook users worldwide as of 1st quarter 2019 (in millions). (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/
Number of social media users worldwide from 2010 to 2021 (in billions). (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
Porterfield, A. (2011, December). 4 Ways to Convert Facebook Fans Into Super Fans. Mashable. Retrieved from https://mashable.com/2011/12/12/facebook-fans-super-fans/?europe=true
Random Forests. (n.d.). Retrieved 21 March 2019, from Metagenomics. Statistics. website: https://dinsdalelab.sdsu.edu/metag.stats/code/randomforest.html
Rishika, R., Kumar, A., Janakiraman, R., & Bezawada, R. (2013). The Effect of Customers’ Social Media Participation on Customer Visit Frequency and Profitability: An Empirical Investigation. Information Systems Research, 24(1), 108–127.
Sabate, F., Berbegal-Mirabent, J., Cañabate, A., & Lebherz, P. R. (2014). Factors influencing popularity of branded content in Facebook fan pages. European Management Journal, 32, 1001–1011.
Sentiment Analysis. (n.d.). Retrieved 20 January 2019, from Technopedia website: https://www.techopedia.com/definition/29695/sentiment-analysis
Setty, S., Jadi, R., Shaikh, S., Mattikalli, C., & Mudenagudi, U. (2014). Classification of Facebook News Feeds and Sentiment Analysis. International Conference on Advances in Computing, Communications and Informatics (ICACCI), 18–23. Institute of Electrical and Electronics Engineers Inc.
Shen, B., & Bissell, K. (2013). Social Media, Social Me: A Content Analysis of Beauty Companies’ Use of Facebook in Marketing and Branding. Journal of Promotion Management, 19(5), 629–651.
Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. Retrieved from https://www.tidytextmining.com
Social Media Statistics & Facts. (2019). Retrieved 19 February 2019, from Statista website: https://www.statista.com/topics/1164/social-networks/
XI
Stephen, A. T., Sciandra, M. R., & Inman, J. J. (2015). Is it What You Say or How You Say It? How Content Characteristics Affect Consumer Engagement with Brands on Facebook.
Swani, K., Brown, B. P., & Milne, G. R. (2014). Should tweets differ for B2B and B2C? An analysis of Fortune 500 companies’ Twitter communications. Industrial Marketing Management, 43, 873–881.
Swani, K., Milne, G. R., Brown, B. P., Assaf, A. G., & Donthu, N. (2017). What messages to post? Evaluating the popularity of social media communications in business versus consumer markets. Industrial Marketing Management, 62, 77–87.
Tafesse, W. (2015). Content strategies and audience response on Facebook brand pages. Marketing Intelligence and Planning, 33(6), 927–943.
Tafesse, W., & Wien, A. (2017). A framework for categorizing social media posts. Cogent Business and Management, 4(1).
Tomaras, P., & Ntalianis, K. (2015). Evaluating the Impact of Posted Advertisements on Content Sharing Sites: An Unsupervised Social Computing Approach. Procedia - Social and Behavioral Sciences, 175, 219–226.
Your Easy Guide to Latent Dirichlet Allocation. (2018). Medium. Retrieved from https://medium.com/@lettier/how- does-lda-work-ill-explain-using-emoji-108abf40fa7d
Zhang, Y., Moe, W. W., & Schweidel, D. A. (2017). Modeling the role of message content and influencers in social media rebroadcasting. International Journal of Research in Marketing, 34, 100–119.
Zhao, W. X., Jiang, J., Weng, J., He, J., & Lim, E.-P. (2011). Comparing Twitter and Traditional Media Using Topic Models. The School of Information Systems at Institutional Knowledge at Singapore Management Univerisity.
XII
Appendix EXHIBIT 1 Literature review EXHIBIT 2 Random ID check of Facebook pages EXHIBIT 3 Distribution plot of dependent variables EXHIBIT 4 Descriptive overview EXHIBIT 5 Performance of information RF model EXHIBIT 6 Method for overall estimation results
1-1
EXHIBIT 1 Literature review
Content itself
Type of content
Unsupervised learning
Supervised learning
Engagement
Sales
Ash
ley
& T
uten
(201
5)x
xx
pred
ictiv
eIn
terB
rand
's B
est
Glo
bal B
rand
sFa
cebo
ok,
Mys
pace
, Tw
itter
, bl
ogs
&
foru
ms
IV: T
wee
ts, n
umbe
r of c
hann
els,
reso
nanc
e, a
nim
atio
n, u
ser
imag
e ap
peal
, exc
lusi
vity
app
eals
, fun
ctio
nal a
ppea
ls, e
xper
ient
ial
appe
als,
em
otio
nal a
ppea
ls, s
ocia
l cau
se &
ince
ntiv
e to
sha
re
cont
ent.
D
V: N
umbe
r of p
eopl
e fo
llow
ing,
Fac
eboo
k fa
ns, s
ocia
l in
fluen
ce, f
ollo
wer
s &
Eng
agem
ent s
core
.
Focu
s on
whi
ch s
ocia
l med
ia c
hann
els
& c
reat
ive
stra
tegi
es a
re u
sed
and
how
they
are
rela
ted
to c
onsu
mer
en
gage
men
t (sc
ore
Enga
gem
entd
B).
Fo
cus
on c
orre
latio
n w
ith e
ngag
emen
t sco
re.
Ber
ger
& M
ilkm
an (2
012)
xx
xpr
edic
tive
New
Yor
k Ti
mes
artic
les
Con
tent
: Ang
er, a
nxie
ty, s
adne
ss, a
we,
em
otio
nalit
y, p
ositv
ity.
DV
: Pos
ition
on
mos
t e-m
aile
d lis
t.St
udy
1: F
ield
stu
dy o
f em
otio
ns a
nd v
iralit
y of
NY
T.
Stud
y 2:
How
hig
h-ar
ousa
l em
tions
aff
ect t
rans
mis
sion
.
Stud
y 3:
How
dea
ctiv
atin
g em
otio
ns a
ffec
t tra
nsm
issi
on.
co
ntro
l var
iabl
es (p
ract
ical
util
ity, i
nter
estin
g &
su
rpris
ing)
.
Cvi
jikj &
Mic
hahe
lles
(201
3)x
xx
xpr
edic
tive
FMC
G
(foo
d/be
vera
ges)
.Fa
cebo
okC
onte
nt: E
nter
tain
men
t, in
form
atio
n, re
mun
erat
ion,
viv
idne
ss,
Inte
ract
vity
& p
ostin
g tim
e (w
orkd
ay &
pea
k ho
urs)
.
D
V: L
ikes
,com
men
ts, s
hare
s an
d in
tera
ctio
n du
ratio
n (e
ngag
emen
t).
Whi
ch c
onte
nt s
houl
d be
pos
ted
and
whe
n.
Dav
is e
t al.
(201
4)x
qual
itativ
e/
Face
book
Cat
egor
izat
ion:
Fun
ctio
nal b
rand
con
sum
ptio
n, e
mot
iona
l bra
nd
cons
umpt
ion,
sel
f-or
ient
ed b
rand
con
sum
ptio
n, s
ocia
l bra
nd
cons
umpt
ion
and
rela
tiona
l bra
nd c
onsu
mpt
ion.
Five
Sou
rces
Mod
el. Q
ualit
ive
rese
arch
thro
ugh
focu
s gr
oups
& o
fflin
e in
terv
iew
s.
De
Vri
es &
Car
lson
(201
4)x
qual
itativ
epr
oduc
t & s
ervi
ceFa
cebo
ok
(sur
vey
rela
ted)
Cat
egor
izat
ion:
Fun
ctio
nal v
alue
, hed
onic
val
ue, s
ocia
l val
ue &
co
-cre
atio
n va
lue.
Qua
litiv
e re
sear
ch th
roug
h qu
estio
nnai
re. F
ocus
on
cust
omer
eng
agem
ent.
de V
ries
et a
l. (2
012)
xx
xx
pred
ictiv
e6
prod
uct c
ateg
orie
s:
food
, acc
esso
iries
, le
isur
e w
ear,
alco
holic
be
vera
ges,
cos
met
ics
&
mob
ile p
hone
s.
Face
book
Con
tent
: Viv
idne
ss, i
nter
actv
ity, i
nfor
mat
iona
l con
tent
, en
terta
inm
ent c
onte
nt, p
ositi
on &
val
ence
of c
omm
ents
.
D
V: L
ikes
& c
omm
ents
(Bra
nd p
ost p
opul
arity
).
Whi
ch c
onte
nt s
houl
d be
pos
ted.
Goh
et a
l. (2
013)
xx
xpr
edic
tive
reta
iler
Face
book
Con
tent
: inf
orm
atio
n ric
hnes
s (in
form
ativ
e ef
fect
) & v
alen
ce
(per
suas
ive
effe
ct).
DV
: Tot
al p
urch
ase
expe
nditu
re.
Use
r-ge
nera
ted
cont
ent v
s. m
arke
ter-
gene
rate
d co
nten
t.
D
irec
ted
com
mun
icat
ion
vs. u
ndir
ecte
d co
mm
unic
atio
n.
Gum
mer
us e
t al.
(201
2)x
qual
atiti
vega
min
gFa
cebo
ok
(sur
vey
rela
ted)
Cat
egor
izat
ion:
soc
ial,
ente
rtain
men
t & e
cono
mic
.Q
ualit
ive
rese
arch
thro
ugh
ques
tionn
aire
. Foc
us o
n lo
yalty
.
Hop
kins
& K
ing
(201
0)x
xpr
edic
tive
pres
iden
tial e
lect
ion
Blo
gC
onte
nt: e
xtre
mel
y ne
gativ
e, n
egat
ive,
neu
tral,
posi
tive,
ext
rem
ely
posi
tive,
no
opin
ion
and
not a
blo
g.Fo
cus
on a
utom
ated
con
tent
anl
ysis
. Int
rodu
cing
a m
etho
d fo
r est
imat
ing
docu
men
t cat
egor
y pr
opor
tions
.
* IV
= In
depe
nden
t var
iabl
e(s)
, DV
= D
epen
dent
var
aibl
e(s)
Auth
ors
Media elements (photo's, links, video's,…)
Cate
goriz
atio
nEx
tra
info
rmat
ion
Data
so
urce
Rese
arch
m
etho
d (p
redi
ctiv
e/
desc
riptiv
e/
expl
orat
ory.
..)
Indu
stry
Mai
n Fo
cus
Lear
ning
ap
proa
ch
Comparison different categorization techniques
Depe
nden
t va
riabl
es
1-2
Content itself
Type of content
Unsupervised learning
Supervised learning
Engagement
Sales
Jahn & K
unz (2012)x
qualitativedifferent industries
Facebook (survey related)
Categorization: content-oriented (functional value &
hedonic value), relationship-oriented (social interaction value &
brand interaction value) &
self-oriented (self-concept value).
Qualitive research through questionnaire. Focus on
satisfaction & loyalty.
Kim
et al. (2015)x
xx
xpredictive
Five product categories: convenience, shopping, specialty, industrial &
services. (B
est Global
Brands 2012)
FacebookIV
: Task-oriented (e.g. new product launch, advertising, online
coupons/discounts/contests), Interaction-oriented (e.g. picture, im
age, video, personal statement, special event, opinion, talks
about season/weather or entertainm
ent, asking for likes/com
ments/shares/answ
ers) & self-oriented (e.g. com
pany inform
ation, brand event, employee m
entions). D
V: likes,com
ments &
shares.
Five product categories: convenience, shopping, specialty, industrial &
services.
Lee et al. (2018)
xx
xx
predictiveInterbrand's best global 100 brands, 6 categories: celebrities and public figures, entertainm
ent, consum
er products and brands, organizations and com
pany, websites
& local places and
businesses
FacebookC
ontent: Directly inform
ative content & B
rand personality-related content. D
V: Likes, com
ments, shares and click-troughs.
Takes EdgeRank into account. Techniques used: A
mazon
Mechanical Turk &
natural language processing. Deep
analysis of the 2 categorization.
Meire et al. (2016)
xx
predictivesport (soccer)
FacebookIV
: featurs of focal post (lexicon, lexical, syntactic & tim
e) &
Auxiliary features (leading &
lagging).Focus on sentim
ent anlaysis.
Netzer et al. (2012)
xx
descriptiveC
ase 1: sedan cars. C
ase 2: diabetes drugs.Forum
sR
elationship between U
ser-generated content and Market-
structureR
ishika et al. (2013)x
xpredictive
retailer (whine)
FacebookIV
: Custom
ers' participation ("fan"). D
V: The intensity of the custom
er-firm relationship (visit
frequency) & profitability.
Interaction effects with custom
er characteristics: Purchase am
ount, Focus of buying, Deal sensitivity, Share of
premium
products, Loyalty, Age, G
ender, Income &
R
ace.
Sabate et al. (2014)x
xx
xpredictive
Spanish travel agenciesFacebook
IV: richness (im
ages,videos & links), tim
e frame (day of the
week, tim
e of publication). D
V: popularity (num
ber of likes & num
ber of comm
ents).
Control variables: length of the w
all post & num
ber of follow
ers.
Setty et al. (2014)x
xx
predictivedifferent industries
FacebookC
ategorization: liked pages posts, entertainment posts &
life event posts.
Focus on classification of Facebook posts and automatic
sentiment analysis.
Shen & B
issell (2013)x
xx
ExploratoryC
osmetic (beauty)
FacebookIV
: Event, product (e.g. product launch, product reviews and
tips), promotion (e.g. coupon, sam
ple and giveaways) &
entertainm
ent (e.g. beaut poll, Q&
A, survey, actviity w
ith reward
and applications, application services within the Facebook page).
DV
: likes & com
ments.
* IV = Independent variable(s), D
V = D
ependent varaible(s)
CategorizationAuthors
Media elements (photo's, links, video's,…)
Main
FocusLearning approach
Dependent variables
Extra information
Comparison different categorization techniques
Research m
ethod (predictive/ descriptive/ exploratory...)
IndustryData source
1-3
Content itself
Type of content
Unsupervised learning
Supervised learning
Engagement
Sales
Step
hen
et a
l. (2
015)
xx
xx
pred
ictiv
e4
inds
utrie
s: c
onsu
mer
-pa
ckag
ed g
oods
, re
stau
rant
s, re
tail
&
spor
ts
Face
book
Con
tent
: aro
usal
-orie
nted
, per
suas
ion-
orie
ntat
ed, i
nfor
mat
ion
&
calls
to a
ctio
n.
DV
: Atti
tudi
nal r
espo
nses
(lik
es &
neg
ativ
es) &
Mar
ketin
g ou
tcom
es (b
rand
exp
osur
e (r
each
), fe
edba
ck (c
omm
ents
), w
ord
of
mou
th (s
hare
s) &
web
site
traf
fic re
ferr
als
(clic
ks))
.
Focu
s on
wha
t (in
form
atio
n ch
ract
eris
tics)
is s
aid
and
how
(per
suas
ion
char
acte
rist
ics)
it is
sai
d.
Taki
ng a
udie
nce
mix
in c
osid
erat
ion
(cor
e vs
. cor
e +
non
core
fans
).
Swan
i et a
l. (2
014)
xx
pred
ictiv
eFo
rtune
500
Tw
itter
Cat
egor
izat
ion:
Cor
pora
te b
rand
nam
e, p
rodu
ct b
rand
nam
e,
func
tiona
l app
eals
, em
otio
nal a
ppea
ls, d
irect
cal
ls to
pur
chas
e,
info
rmat
ion
sear
ch &
has
tags
.
Focu
s on
diff
eren
ces
betw
een
B2B
and
B2C
.
Swan
i et a
l. (2
017)
xx
xpr
edic
tive
Fortu
ne 5
00
Face
book
IV: B
rand
cue
(cor
pora
te n
ame
& p
rodu
ct n
ame)
, mes
sage
app
eal
(fun
ctio
nal a
ppea
l & e
mot
iona
l app
eal),
sel
ling
stra
tegy
&
info
rmat
ion
sear
ch.
DV
: Lik
es &
com
men
ts.
Focu
s on
diff
eren
ces
betw
een
B2B
and
B2C
.
Taf
esse
(201
5)x
xx
xpr
edic
tive
auto
mob
iel
Face
book
IV: V
ivid
ness
, int
erac
tivty
, nov
elty
, bra
nd c
onsi
sten
cy &
con
tent
ty
pe (t
rans
actio
nal,
info
rmat
iona
l & e
nter
tain
men
t).
DV
: lik
es &
sha
res.
Con
trol v
aria
bles
(fan
num
bers
, pos
ting
date
& v
ehic
le
cate
gory
).
Taf
esse
& W
ien
(201
7)x
xx
**qu
alita
tive
cont
ent
anal
ysis
(d
educ
tive,
in
duct
ive
&
valid
atio
n co
ding
)
Inte
rBra
nd's
Bes
t G
loba
l Bra
nds
Face
book
Cat
egor
izat
ion:
Em
otio
nal f
unct
iona
l, ed
ucat
iona
l, br
and
reso
nanc
e, e
xper
ient
ial,
curr
ent e
vent
, per
sona
l, em
ploy
ee, b
rand
co
mm
unity
, cus
tom
er re
latio
nshi
p, c
ause
-rel
ated
& s
ales
pr
omot
ion.
Qua
litiv
e co
nten
t ana
lysi
s. A
lso
sum
mar
izat
ion
of
prev
ious
lite
ratu
re s
tudy
bas
ed o
n pr
opos
ed
cate
goriz
atio
ns.
Zha
ng e
t al.
(201
7)x
xde
scrip
tive
(fac
tor
anal
ysis
)bu
sine
ss s
choo
lsTw
itter
Con
tent
: 3 fa
ctor
s: S
cool
, fin
ance
& p
oliti
cs
D
V: r
ebro
adca
stin
g ac
tvity
Our
stu
dyx
xx
xx
xpr
edic
tive
Bar
s &
rest
aura
nts
Face
book
Mod
el 0
: Bas
e C
onte
nt A
ppro
ach
Mod
el 1
: Con
tent
App
roac
h
M
odel
2: M
essa
ge S
trate
gy A
ppro
ach
M
odel
3: M
arke
teer
's O
rient
atio
n A
ppro
ach
M
odel
4: V
iral M
arke
ting
Rul
es A
ppro
ach
U
nsup
ervi
sed
App
roac
h
Med
ia T
ype
App
roac
h
D
V: r
eact
ions
, com
men
ts &
sha
res
Com
paris
on o
f diff
eren
t mod
els
of c
ateg
oriz
atio
n (s
uper
vise
d &
uns
uper
vise
d).
* IV
= In
depe
nden
t var
iabl
e(s)
, DV
= D
epen
dent
var
aibl
e(s)
** li
tera
ture
com
paris
on, n
o im
pact
ana
lysi
s co
mpa
rison
Auth
ors
Media elements (photo's, links, video's,…)
Mai
n Fo
cus
Lear
ning
ap
proa
chDe
pend
ent
varia
bles
Comparison different categorization techniques
Rese
arch
m
etho
d (p
redi
ctiv
e/
desc
riptiv
e/
expl
orat
ory.
..)
Indu
stry
Data
so
urce
Cate
goriz
atio
nEx
tra
info
rmat
ion
2-1
EXHIBIT 2 Random ID check of Facebook pages
Page_name (ID) Page name on Facebook Pagina tags
105075676201163 The Bank Bar102555626452916 Neighborly Bar Bar, sportbar641671832534219 Corner Street Pub Bar, restaurant, concert place152649688079204 The Stables Bar128652537199394 Doc Holliday's Saloon Tombstone, Arizona Bar973533759338577 Bottoms Up Bar1434430990167030 Ritz On The River Bar1682898595277000 Cardinal Cage Bar316853401797893 Bye the Willow Bar, winebar59540095667 The Malt House Bar, art & entertainment, whiskybar
3-1
EXHIBIT 3 Distribution plot of dependent variables
4-1
EXHIBIT 4 Descriptive overview
Figure 4-1: Most common used words
Figure 4-2: Wordcloud of most used words (no condition)
4-2
Figure 4-3: Wordcloud of most used words (3% condition)
Figure 4-4: Wordcloud of most used words (3% condition and less meaning words condition)
4-3
Table 4-1: Descriptive statistics 1: Facebookposts dataset
Table 4-2: Descriptive statistics 2: classification approaches variables
Variable Distinctive variables Minimum Maximum Mean SD Median
Feed_id 240210 / / / / /Feed_created_time 239435 2011-01-01 01:41:37
UTC2016-12-30 23:53:11 UTC
/ / /Feed_message 193845 / / / / /Reactions_count 602 0 2623 7,5982 32,42 2Comments_count 165 0 1506 0,8034 7,41 0Shares_count 230 0 13969 1,2377 30,82 0Page_name 476 / / / / /Extracted_on 2653 2017-06-07 13:23:00
UTC2017-06-07 14:59:01 UTC
/ / /Feed_message_length 2481 0 15000 145,78 263 91Feed_words_length 582 0 2704 24,91 42,75 16Weekend day 2 0 ( 73,69%) 1 (26,31%) / / /
Variable Occurence (#) Relative Frequency (%)
TM_1 95397 39,71%TM_2 17865 7,44%TM_3 86218 35,89%MD0_INF 43241 18,00%MD0_TRA.ENT 147435 61,38%MD1_INF 43120 17,95%MD1_ENT 131148 54,60%MD1_TRA 7388 3,08%MD2_FUN 21 0,01%MD2_EXP 158829 66,12%MD2_EMO 176 0,07%MD2_BRA 141 0,06%MD3_TAS 151278 62,98%MD3_REL 1238 0,52%MD3_SEL 1051 0,44%
5-1
EXHIBIT 5 Performance of information RF model
Figure 5-1: MDA for information, Base Content Approach
Figure 5-2:MDG for information, Base Content Approach
EXHIBIT 6 Method for overall estimation results To check the relationship between a specific classification variable and one of the 3 DV’s (number of
reactions, shares or comments), we checked the effect of a variable for 4 different models (Table 6-1).
First of all, the estimated result was checked when we included all the variables into one model (multi
approach model). Secondly, the effect of a classification variable was checked if a model was built with
only the variables from the related classification approach (isolated model). As mentioned before, we
applied for each variable a OLSR model and a NBR model.
Table 6-1: Models applied per categorization variable
For each variable, we compared to which extend the estimated results from the 4 models are in line with
each other. Table 6-2 gives an overview of the overall estimation result for a variable. If the results for
a specific variable were in line over the 4 different models, we classified the overall effect as green with
the corresponding effect. E.g. Emotional from the MSA was found significant and positively related to
the number of reactions for the MAM. The unstandardized coefficients were significant with a p-value
lower than 0.001 for the OLSR method and with a p-value lower than 0.05 for the NBR method.
("#$%&('('),+,-./-012 = 0.597, p < 0.001; "34&('('),+,-./-012 = 0.287, p < 0.05). The isolated MSA
confirmed this significant positive association with the number of reactions for the OLSR and the NBR
method. So in general, we can conclude that emotional is positive related to number of reactions. We
will take these green effects into our marginal implication section. Variables who had some insignificant
coefficients or variables who were removed after stepwise reduction in one of the 4 models and where
all the other variables were significant and related with the same direction to BPP, were classified as
orange. If one of the estimated effects is insignificant, we still assume the overall effect can be formed
since we are dealing with one direction (positive or negative related to BPP) of association. So these
estimated effects are also worth mentioning in the discussion and managerial implications. E.g. brand
resonance was removed after stepwise reduction for the MAM (OLSR) model and was found
insignificant related to the number of shares for the isolated approach (OLSR method)
("#$56(7'),8910:9+<-010=+ = -0.072, p > 0.05). If we look at the NBR method, brand resonance was
found insignificantly related to the number of shares for the MAM ("34&('('),8910:9+<-010=+ = -
0.048, p > 0.05) and marginally significant and negatively associated with the number of shares for the
IM ("34&(7'),8910:9+<-010=+ = -0.385, p < 0.05). Since 3 of the 4 estimated effects was insignificant,
we make the overall assumption that brand resonance is insignificant related to the number of shares.
Multi Approaches model 1 2
Isolated model 3 4
OLSR NBR
All other effects which do not belong to the green or orange category were classified as red. This means
that at least in two out of the four estimated results, contrary effects were found. E.g. Food was found
significant negatively related to the number of reactions for the OLSR method ("#$%&('('),>--: = -
0.111, p < 0.001;"#$%&(7'),>--: = -0.090, p < 0.001), while food was found positive and significantly
related to the number of reactions if we look at the NBR method ("34&('('),>--: = 0.229, p < 0.001;
"34&(7'),>--: = 0.262, p < 0.001). Since these effects are in contraction to each other, we can not make
an assumption on the overall effect and leave these results out of our managerial implication section.
SUMMARY:
- Green: All 4 coefficients for a specific variable are (marginally) significant with the same
related direction with BPP (number of reactions, shares or comments).
- Orange: Some coefficients from one out of the 4 models are insignificant, the other significant
variables have the same related direction with BPP (number of reactions, shares or comments).
- Red: At least one of the 4 variables is (marginally) significant positive related with BPP and at
least another variable is (marginally) significant negative related with BPP.
Table 6-2: Estimated overall effect of the variables over the different models
(Intercept) + + /Topic Approach Party / - / Food / - / Performance + - +Base Content Approach Information + - + Transaction/ Entertainment / / +Content Approach Information + - + Entertainment / + / Transaction + + +Message Strategy Approach Functional / + x Experiential / + / Emotional + x / Brand Resonance + x +Marketeer's Orientation Approach Task-oriented / / / Relationship/ Interaction-oriented + + + Self-oriented + / +Control variables Number of words + + + Weekend day - - -
CommentsReactions Shares