Automatic content-based evaluation of companies' facebook ... · AUTOMATIC CONTENT-BASED EVALUATION...

AUTOMATIC CONTENT-BASED EVALUATION OF COMPANIES' FACEBOOK MESSAGES: APPROACHES AND BASELINE IMPACT Aantal woorden / Word count: 28.259

Henri De Bruyn Stamnummer / student number : 000130683248 Promotor / supervisor: Prof. Dr. Dirk Van den Poel Masterproef voorgedragen tot het bekomen van de graad van: Master’s Dissertation submitted to obtain the degree of: Master in Business Engineering: Data Analytics Academiejaar / Academic year: 2018-2019

Confidentiality agreement

PERMISSION

I declare that the content of this Master’s Dissertation may be consulted and/or reproduced, provided

that the source is referenced.

Name student: Henri De Bruyn

Signature

I

Foreword This master dissertation is the closing piece of my education in Business Engineering, master in Data

Analytics. I would like to take the opportunity to express my very great appreciation to some people

who supported me and made it possible to write this thesis. First of all, I would like to express my deep

gratitude to Professor Van den Poel for being my supervisor and teaching me interesting insights of data

analytics over the last years. Secondly, I would like to offer my special thanks to Assistant Professor

Meire. My research would have been impossible without the support from him. He guided me through

my dissertation by giving valuable critiques over time. In addition, dr. Meire was always open to

schedule a meeting and kept me on the right track where needed. I also would like to thank my

grandfather, who once in a while made special time to proofread my thesis, focusing on grammar. Next,

I would also like to thank my parents who supported me throughout my life, specially over the last years

during my studies. I am profoundly grateful to my father who assisted me during this thesis with fall

backs, and always helped me with whatever I was struggling. Heartfelt gratitude goes to my mum for

always providing me delicious food and lovely talks whenever I needed it the most during the writing

of my thesis. Finally, I would to extend my thanks to my friends who always tried to help me with any

question or problem I was facing.

(L. de Vries, Gensler, & Leeflang, 2012)(Netzer, Feldman, Goldenberg, & Fresko, 2012)(Davis, Piven, & Breazeale, 2014) (Lee, Hosanagar, & Nair, 2018)(Stephen, Sciandra, & Inman, 2015)(Meire, Ballings, & Van den Poel, 2016) (Murzintcev, 2019) (Breiman, 2001) (Zhao, Jiang, Weng, He, & Lim, 2011) (Kim, Spiller, & Hettche, 2015)

II

Contents

Foreword ............................................................................................................................................ I

List of Abbreviations......................................................................................................................... V

List of Tables ................................................................................................................................... VI

List of Figures ................................................................................................................................. VII

1 Introduction ................................................................................................................................ 1

1.1 Problem definition .............................................................................................................. 2

1.2 Objectives and research question ......................................................................................... 4

2 The rise of Social Media ............................................................................................................. 5

2.1 Social Media ....................................................................................................................... 5

2.2 Social Media Strategy ......................................................................................................... 7

2.3 Brand fan pages and posts ................................................................................................... 9

3 Classification review ................................................................................................................ 10

3.1 General overview .............................................................................................................. 11

3.1.1 (Base) Content Approach .......................................................................................... 12

3.1.2 Message Strategy Approach ...................................................................................... 13

3.1.3 Marketeer’s Orientation Approach ............................................................................ 13

3.1.4 Viral Marketing Rules Approach ............................................................................... 13

3.1.5 Unsupervised Approach ............................................................................................ 14

3.1.6 Media Type Approach ............................................................................................... 14

4 (Base) Content Approach.......................................................................................................... 15

4.1 Literature review............................................................................................................... 15

4.2 Classification variables ..................................................................................................... 17

4.2.1 Entertainment ............................................................................................................ 18

4.2.2 Information ............................................................................................................... 18

4.2.3 Transaction ............................................................................................................... 19

5 Message Strategy Approach ..................................................................................................... 19

5.1.1 Literature review ....................................................................................................... 19


5.2.1 Functional ................................................................................................................. 21

5.2.2 Experiential ............................................................................................................... 21

5.2.3 Emotional ................................................................................................................. 22

5.2.4 Brand Resonance ...................................................................................................... 22

6 Marketeer’s Orientation Approach ............................................................................................ 23


III


6.2.1 Task-oriented ............................................................................................................ 24

6.2.2 Relationship/Interaction-oriented ............................................................................... 25

6.2.3 Self-oriented ............................................................................................................. 26

7 Viral Marketing Rules Approach .............................................................................................. 27



7.2.1 Event ........................................................................................................................ 27

7.2.2 Product ..................................................................................................................... 28

7.2.3 Promotion ................................................................................................................. 28

7.2.4 Entertainment ............................................................................................................ 28

8 Unsupervised Approach............................................................................................................ 29

9 Media Type Approach .............................................................................................................. 30

9.1 Interactivity ...................................................................................................................... 30

9.2 Vividness .......................................................................................................................... 31

10 Valence ................................................................................................................................ 32

11 B2B vs B2C ......................................................................................................................... 33

12 Human classification coding ................................................................................................. 34

12.1 (Base) Content Approach .................................................................................................. 34

12.2 Message Strategy Approach .............................................................................................. 35

12.3 Marketeer’s Orientation Approach .................................................................................... 36

12.4 Takeaways ........................................................................................................................ 36

13 Methodology ........................................................................................................................ 37

13.1 Data .................................................................................................................................. 37

13.2 Model description ............................................................................................................. 37

13.3 Data preparation ............................................................................................................... 38

13.4 Unsupervised algorithm .................................................................................................... 39

13.5 Supervised algorithm ........................................................................................................ 39

13.5.1 Singular Value Decomposition .................................................................................. 40

13.5.2 Random Forest .......................................................................................................... 40

13.5.3 Performance .............................................................................................................. 41

13.6 Basic impact analysis ........................................................................................................ 42

13.6.1 Independent variables ................................................................................................ 44

13.6.2 Dependent variables .................................................................................................. 45

14 Results ................................................................................................................................. 46

14.1 Descriptive statistics ......................................................................................................... 46

14.2 Topic model...................................................................................................................... 47

IV

14.3 Random Forest.................................................................................................................. 49

14.3.1 Information model..................................................................................................... 49

14.3.2 Evaluation supervised approaches ............................................................................. 50

14.4 Basic impact approach ...................................................................................................... 52

14.4.1 Model evaluation ...................................................................................................... 52

14.4.2 Estimation results ...................................................................................................... 53

14.4.3 Number of reactions .................................................................................................. 56

14.4.4 Number of shares ...................................................................................................... 59

14.4.5 Number of comments ................................................................................................ 60

15 Discussion and managerial implications ................................................................................ 63

15.1 Selection of Classification approach .................................................................................. 63

15.2 Enhancing Brand post popularity ...................................................................................... 64

15.2.1 Enhancing the number of reactions ............................................................................ 64

15.2.2 Enhancing the number of shares ................................................................................ 65

15.2.3 Enhancing the number of comments .......................................................................... 65

16 Summary .............................................................................................................................. 67

References ..................................................................................................................................... VIII

Appendix ........................................................................................................................................ XII

V

List of Abbreviations AIC Akaike Information Criterion

AMT Amazon Mechanical Turk

AUC Area Under the Receiver Operating Characteristic Curve

B2B Business to Business

B2C Business to Consumers

BCA Base Content Approach

BP Brand post

BPP Brand post popularity

CA Content Approach

CE Customer's engagement

DTM Document term matrix

DV Dependent variable

e.g. exempli gratia

etc. et cetera

IM Isolated model

IV Independent variable

LDA Latent Dirichlet allocation

MAM Multi Approaches model

MDA Mean Decrease in Accuracy

MDG Mean Decrease in Gini

MGC Marketeer-generated content

MOA Marketeer's Orientation Approach

MSA Message Stragegy Approach

MTA Media Type Approach

NBR Negative Binomial Regression

OLSR Ordinary Least Squares Regression

OOB Out-of-Bag

RF Random Forest

SL Supervised learning

SM Social Media

SMA Social Media analysis

SMM Social Media marketing

SMS Social Media strategy

UGC User-generated content

UGT Uses and Gratifications theory

UL Unsupervised learning

VMRA Viral Marketing Rules Approach

WOM Worth of mouth

VI

List of Tables TABLE 1: SOCIAL MEDIA CHANNEL USAGE (ASHLEY & TUTEN, 2015) .................................................................. 7

TABLE 2: CLASSIFICATION APPROACHES .......................................................................................................... 11

TABLE 3: LITERATURE REVIEW: (BASE) CONTENT APPROACH ........................................................................... 15

TABLE 4: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: (BASE) CONTENT APPROACH ..... 17

TABLE 5: LITERATURE REVIEW: MESSAGE STRATEGY APPROACH ..................................................................... 19

TABLE 6: MESSAGE STRATEGY USAGE (ASHLEY & TUTEN, 2015) ...................................................................... 20

TABLE 7: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: MESSAGE STRATEGY APPROACH 21

TABLE 8: LITERATURE REVIEW: MARKETEER'S ORIENTATION APPROACH ......................................................... 23

TABLE 9: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: MARKETEER'S ORIENTATION

APPROACH ............................................................................................................................................ 24

TABLE 10: COMMON MESSAGE THEMES OF EACH CLASSIFICATION VARIABLE: VIRAL MARKETING RULES

APPROACH ............................................................................................................................................ 27

TABLE 11: LITERATURE REVIEW: MEDIA TYPE APPROACH ................................................................................ 30

TABLE 12: OVERVIEW HUMAN CODING CLASSIFICATION ................................................................................. 35

TABLE 13: DIFFERENT MODELS OF IMPACT APPROACHES ................................................................................ 43

TABLE 14: INDEPENDENT VARIABLES............................................................................................................... 44

TABLE 15: AUC'S OF THE DIFFERENT CLASSIFICATION VARIABLES .................................................................... 50

TABLE 16: EVALUATION OF THE DIFFERENT APPROACHES ............................................................................... 52

TABLE 17: ESTIMATION RESULTS FOR BRAND POST POPULARITY, MULTI APPROACHES MODEL ....................... 53

TABLE 18: ESTIMATION RESULTS FOR BRAND POST POPULARITY, TOPIC APPROACH ....................................... 54

TABLE 19: ESTIMATION RESULTS FOR BRAND POST POPULARITY, BASE CONTENT APPROACH ......................... 54

TABLE 20: ESTIMATION RESULTS FOR BRAND POST POPULARITY, CONTENT APPROACH .................................. 55

TABLE 21: ESTIMATION RESULTS FOR BRAND POST POPULARITY, MESSAGE STRATEGY APPROACH ................. 55

TABLE 22: ESTIMATION RESULTS FOR BRAND POST POPULARITY, MARKETEER'S ORIENTATION APPROACH ..... 56

VII

List of Figures FIGURE 1: ‘NUMBER OF SOCIAL MEDIA USERS WORLDWIDE FROM 2010 TO 2021 (IN BILLIONS)’, 2019 ............. 6

FIGURE 2: ‘MOST POPULAR SOCIAL NETWORKS WORLDWIDE AS OF APRIL 2019, RANKED BY NUMBER OF

ACTIVE USERS (IN MILLIONS)’, 2019 ......................................................................................................... 6

FIGURE 3: CONCEPTUAL FRAMEWORK (DE VRIES ET AL., 2012) ....................................................................... 16

FIGURE 4: CONCEPTUAL FRAMEWORK (CVIJIKJ & MICHAHELLES, 2013) ........................................................... 16

FIGURE 5: CONCEPTUAL FRAMEWORK (SABATE ET AL., 2014) ......................................................................... 31

FIGURE 6: CONCEPTUAL FRAMEWORK (SWANI ET AL., 2017) .......................................................................... 33

FIGURE 7: CONCEPTUAL FRAMEWORK ............................................................................................................ 38

FIGURE 8: REACTIONS POSSIBILITIES ON A FACEBOOK POST (KRUG, 2016) ...................................................... 45

FIGURE 9: OPTIMAL NUMBER OF TOPICS ........................................................................................................ 47

FIGURE 10: TOP 10 TERMS OF EACH TOPIC ..................................................................................................... 48

FIGURE 11: AUC-CURVES OF THE DIFFERENT CLASSIFICATION APPROACHES ................................................... 51

1

1 Introduction Social Media (SM) platforms have experienced exponential growth over the last years. Networking sites

like Facebook, Twitter & LinkedIn have shown an exceptional increase. Users have shifted from

traditional communication channels like mail and post to micro-blogging and SM platforms (Setty, Jadi,

Shaikh, Mattikalli, & Mudenagudi, 2014). In the past, companies have tried to build up their relationship

with their customers through traditional marketing activities like public relations and direct marketing.

Nowadays the passive customer-company relationship has shifted towards an active relationship where

customers are becoming co-creators online which create multiple opportunities for an increase in word

of mouth (WOM) and engagement towards the brand (Jahn & Kunz, 2012). The increase of SM

platforms has resulted in an increasing availability of data. Every day, SM databases are becoming richer

and richer which has made “Big Data” a hot topic. Facebook is one of the main SM platforms that makes

efficient use of their data. They even have made abuse of your data. In 2018, 2.7 million Europeans

where affected by Facebook’s privacy scandal (‘2,7 miljoen Europeanen zijn getroffen door

privacyschandaal Facebook’, 2016). Facebook has been an important factor in the discussion of users’

privacy. Besides this negative publicity for Facebook, thousands of companies make efficient use of

their data on Facebook to elaborate their marketing strategy. By generating specific content online

through marketeer-generated content (MGC) they can achieve their marketing goals.

Brands must focus on specific content to increase customers’ motivation to participate online and

become loyal to the brand (Sabate, Berbegal-Mirabent, Cañabate, & Lebherz, 2014). Successful content

will be perceived positively by consumers who will add value to the content by liking, commenting,

sharing, clicking, etc. which has resulted in an enormous increase in user-generated content (UGC) or

WOM (Goh, Heng, & Lin, 2013). Marketeers want to have more structure and insight into their data to

know which content is valuable (Setty et al., 2014). Over the last years, efforts have been made to

automatically classify posts into a specific content category. A first effort dates to the late 1600 when

the church tried to track documents which were not of religious content (Hopkins & King, 2010). Similar

techniques were used during mid 1900, where for the first time the word “content analysis” was used.

Recently, the increase of digitalized text on SM, web pages, blogs, online texts etc. has made automatic

content-based evaluation even more important.

Consumers who interact every day on a brand fan page become advocates of that brand. These people

are important for the brand since they can influence other consumer’s opinion or purchase behaviour. In

other words, marketeers, who have a well-applied Social Media strategy (SMS) by posting the right

content, are a step ahead to increase customer’s engagement (CE) or even to boost sales performance of

their customers (Swani, Brown, & Milne, 2014). Besides, Facebook is the new number one way for

companies “to get the word out and bring people in” (Shen & Bissell, 2013).

2

1.1 Problem definition The increase of SM has resulted in a new area of research. Content research of SM posts/tweets/blogs

has become a hot topic over the last years. A lot of research has focused on what content variables drive

consumers’ engagement in the form of brand post popularity (BPP) (number of likes, number of

comments, number of shares, number of click-throughs etc.). Besides the increasing popularity of this

topic, we have identified 3 problems based on previous literature.

Problem 1: What content should marketeers post online (MGC) to increase online customer’s

engagement? This has become an important question.

De Vries, Gensler, & Leeflang (2012) investigated the impact of the valence of comments on BPP.

Netzer, Feldman, Goldenberg & Fresko (2012) conducted an unsupervised analysis on user-generated

content to get market structures & insights out of it. While these two studies placed the focus more on

UGC, there has been a shift by giving more attention to MGC. Kim, Spiller & Hettche (2015) classified

brand posts (BP’s) into 3 categories which focus on the marketeer’s perception (task-oriented,

relationship-oriented & self-oriented content). Cvijikj & Michahelles (2013) studied which content

types (entertainment, information or remuneration) have an impact on BPP. Shen & Bissell (2013)

shifted the focus from content delivery to content exchange classifying Facebook posts into event,

product, promotion & entertainment (MGC). Different outcomes have been found between UGC and

MGC. While for UGC both information richness and valence are influencing purchase behaviour,

valence only plays an important role for the impact on sales for marketeers (Goh et al., 2013). In other

words, marketeers should play a persuasive role in SM context using positive words and phrases in their

posts. Over the last years, more studies have already shifted the focus from UGC to MGC by taking the

marketeer’s viewpoint into account. Besides, it is stated as a problem, we would rather call it a remark

to take into account.

Problem 2: Previous research has focused on one specific content classification framework.

A lot of studies over the last years have analysed the impact of different content variables on BPP or

audience response. Most of these studies have focused on one specific content classification framework

and less is known about the comparison of these different classification approaches. A classification

approach or framework refers to a possible way of classifying Facebook posts into different categories.

There has been a trade off in previous research between focusing on one classification framework,

accomplished with a predictive approach (e.g. analysing the impact on BPP) (Cvijikj & Michahelles,

2013; de Vries et al., 2012; Sabate et al., 2014) and studies who have focused on the comparison of

different classification methods to come up with a general model without analysing the impact of the

model on BPP (Tafesse & Wien, 2017). De Vries et al. (2012) classified BP’s into vividness,

interactivity, informational & entertaining content posts. The framework from Cvijikj & Michahelles

(2013) paid already more attention to the content of posts but only made use of one specific classification

3

framework. Some studies made already use of a predictive approach on the number of likes & comments

(Swani et al., 2014; Swani, Milne, Brown, Assaf, & Donthu, 2017) while other studies were of

qualitative origin (Davis et al., 2014; Jahn & Kunz, 2012; Tafesse & Wien, 2017). Although, these

studies have made use of one well-established framework, no study has compared the effectiveness of

different content frameworks.

Problem 3: The classification frameworks of previous research are at randomly chosen (not based

on previous literature).

The categorization frameworks of social BP’s of previous research are mostly based on subjective

choice. The study from de Vries et al. (2012) has analysed the impact of different post types on customer

engagement. This research was based on a conceptual framework for the determinants of BPP (number

of likes and number of comments). Despite this framework is well applicable to BP’s, it has an important

limitation. The authors came up with this framework based on own preferences without looking at how

frameworks have been used in previous literature. The framework from Cvijikj & Michahelles (2013)

is already better aligned with previous research. It made use of the Uses and Gratifications theory (UGT)

and looked at what drives consumers for online engagement with a preferred brand. Still, this framework

only focused on the UGT and did not look at other possible approaches for the categorization of BP’s.

To our knowledge the study from Tafesse & Wien (2017) is the first study who came up with a

formalized analysis of BP’s. The different classification frameworks were based on an extensive review

of previous SM literature. Still, this study focused more on building 1 overall classification framework

(consisted out of 12 exhaustive and mutually exclusive categories). The study did not include an analysis

of the different content classification approaches which have been used in previous research or even

conducted a baseline impact on BPP. Overall, previous research frameworks have been too much ad hoc

to come up with well-established different classification approaches.

Given the higher interest of fostering online engagement between companies and their customers,

companies want to know which different content classification approaches are available and to which

extend each approach contributes to the online relationship with the consumers. Thus, this thesis has the

following purpose:

To Investigate which different classification approaches have been used in automatic content

classification of Facebook messages in order to find out which approach is best suitable for content

classification and to which extend each category has an impact on BPP (likes, shares & comments)?

4

1.2 Objectives and research question The exponential increase of SM and the associated data has resulted in more research of what content

influence consumers top participate online. Since this topic is quite new in research, this branch still has

a lot of opportunities for further research. We found 3 limitations in previous research, which we will

mainly tackle in our study. Remark 1 stated that managers and marketeers want to get more valuable

insights on what content to post to increase customer engagement. The first studies in Social Media

analysis (SMA) were most of UGC (L. de Vries et al., 2012), while MGC is getting more attention over

the last few years (Kim et al., 2015; Swani et al., 2017). Since managers want to get more useful insights

in what content to post online on their Brand page, we will take the same viewpoint into account and

look at what drives marketeers to post specific content online (MGC). Problem 2 rose from the

limitation that previous research has focused on one specific content classification framework. To our

knowledge, no study has ever compared the effectiveness of different content classification approaches.

This is the first study that will compare different content coding methods together with their relationship

with BPP. Problem 3 stated that previous categorization frameworks are not based on the extensive

review of previous SM literature. Prior research on SM content made use of limited characteristics.

There is need for a more comprehensive approach of content categorization of SM posts. The

classification approaches of our thesis will be based on previous literature. Our thesis will take these 3

limitations into account.

The research question of this master dissertation will intend to answer the following questions:

RQ1: Which approaches have been currently used for automatic content classification of companies’

Facebook messages, how does each categorization approach looks like, and is there a best suitable

model for automatic content classification?

RQ2: What is the impact per category of each classification approach on brand post popularity

(number of likes, shares & comments)?

This research is structured as follows. First, we present the main concepts of SM, SMS and brand fan

pages & posts. Secondly, we will elaborate on previous literature, focusing on which different

approaches have been used in SM content classification. In the next sections, we will also briefly look

at how media types and valence have been used in preceding research, followed by the main differences

of content classification between Business to Business (B2B) and Business to Consumers (B2C)

environment. Subsequently, a short section is dedicated to the human classification of Facebook

messages. Next, we will present the methodology of our research, followed by the findings of our

analysis. Finally, we present the conclusions of our results, to end with limitations of our research and

future research possibilities.

5

2 The rise of Social Media Facebook is at this moment the largest growing social network. In the third quarter of 2018 it had 2.27

billion active users on a monthly base and it is still increasing every quarter. In 2012 Facebook exceeded

the monthly active users of 1 billion, which made it the first social network who ever surpassed this limit

(‘Number of monthly active Facebook users worldwide as of 1st quarter 2019 (in millions)’, 2019).

Facebook has become one of the main WOM communications for brands. Marketeers perceive it as the

most attractive SM network in a B2C environment (Cvijikj & Michahelles, 2013). Over the last year,

the importance of SM has increased for brands to communicate with their consumers. Nowadays,

companies are using SM for increasing their customer relationship, services, sales promotions, branding

and research (Ashley & Tuten, 2015). Through this paper we will make use of the UGT which tries to

understand the goals and motivations of individuals for social engagement for different type of posts

(Cvijikj & Michahelles, 2013). It explains that consumer needs for communications are aligned towards

content, relationships and themselves while focusing on mass media communication (Ashley & Tuten,

2015). The UGT explains why people use different type of SM (de Vries & Carlson, 2014; Jahn & Kunz,

2012). Specific to our paper, we will focus which characteristics of MGC drive people to social

engagement by looking at which different content classification approaches have been used in SM

research. By making use of the UGT we will better understand which content marketeers should post to

increase their CE. Before analysing the different classification approaches, we will go deeper into some

of the main concepts used in SM content analysis.

2.1 Social Media SM is a virtual place on the internet, which allows bringing people together from different cultural and

geographical backgrounds on a large scale, where people can express themselves online by interacting

and sharing their opinions (Tafesse, 2015). It allows people to create and exchange user-generated

content (Jahn & Kunz, 2012; Shen & Bissell, 2013). SM are presented in many different forms on the

web including blogs, forums, photo-sharing platforms, social gaming, micro blogs, chat apps, and most

important social networks. In 2018, 2.62 billion people have used social network sites, which is more

than one out of 4 people in the world. It is predicted that by 2021, the cap of 3 billion active users on

SM will be transcended (‘Number of social media users worldwide from 2010 to 2021 (in billions)’,

2019) (Figure 1).

6

Figure 1: ‘Number of Social Media users worldwide from 2010 to 2021 (in billions)’, 2019

Eastern Asia and Northern America are the global region where SM is most popular with a penetration

rate of 70%, followed by Northern Europe (‘Global social network penetration rate as of January 2019,

by region’, 2019). Facebook is the leading network site based on active users, followed by YouTube

and WhatsApp, which can be seen on Figure 2 (‘Most popular social networks worldwide as of April

2019, ranked by number of active users (in millions)’, 2019). A trend that we can see in the last years,

is the switch from advertising on PC to mobile advertising since mobile devices are taking the global

lead in SM use (‘Market-Revenue Per Internet User’, 2019). Mobile-first platforms have become more

popular such as Instagram or Twitter. Marketeers can advertise their BP’s in order to increase the reach

of their message on the market. America is the country where most add spending is generated, followed

by China. SM have become the number one place were Marketeers can implement their strategy.

Figure 2: ‘Most popular social networks worldwide as of April 2019, ranked by number of active users (in millions)’, 2019

7

On average, people spend around 20 to 25% of their total time on the Internet and on SM network sites

(Tomaras & Ntalianis, 2015). Global Internet users spend around 135 minutes per day surfing on SM

(‘Social Media Statistics & Facts’, 2019). This exponential increase of SM has shifted the way of how

marketeers should co-operate with their consumers. Each SM site has its own characteristics in terms of

culture and purpose that can be used to execute a specific SMS. This can have a significant impact on

the business practices (Kim et al., 2015; Swani et al., 2014). New opportunities have been raised due to

the SM explosion: companies can increase their public awareness about the brand or even better, align

their product development through closer community involvement. Companies start online competitions

where they let their consumers cooperate in developing a new product and for which the winning team

wins a job offer (Cvijikj & Michahelles, 2013). People are using SM to get specific two-way interactions

with their brand. They make use of SM sites when traditional communication channels are unavailable,

time-consuming or expensive (Davis et al., 2014).

Table 1: Social Media Channel Usage (Ashley & Tuten, 2015)

The study from Ashley & Tuten (2015) analysed which SM channels are being used by companies. An

overview of the top SM channels can be seen in Table 1. Micro blog (e.g. Twitter), social networking

(e.g., Facebook) and microsites (sites at a separate web address to forward to a friend) were the most

commonly used channels. This leads to the conclusion that the marketing communication is happening

where the customer is active most nowadays. The data of our research is coming from Facebook.

2.2 Social Media Strategy Social Media marketing (SMM) or SMS is “the usage of the existing SM platforms for increasing the

brand awareness among consumers on online platforms through utilization of the WOM principles”

(Cvijikj & Michahelles, 2013, p. 845). Another definition is “the utilization of SM technologies,

channels, and software to create, communicate, deliver and exchange offerings that have value for an

organization’s stakeholders” (Tafesse & Wien, 2017, p. 4). The efficient use of a well-defined SMS

gives the company a lot of opportunities such as increasing public awareness or making efficient use of

the data provided by SM. B2B and B2C marketeers are using different SMS’s to increase CE with their

8

target group (Swani et al., 2014, 2017). A main component of implementing a SMS are brand pages

(BP), which are online social networking platforms to connect with their customers and fans. One of the

main reasons why BP are so important for the marketing strategy is that they allow to build an online

community and interact with them (Tafesse & Wien, 2017). Brand communities are communities

recognized by shared values, rituals, myths, hierarchy, vocabulary and traditions, but also by a sense of

moral responsibility. It is a driver for brand commitment, which boosts the relationship between the

brand and the consumers (Gummerus, Liljander, Weman, & Pihlström, 2012; Jahn & Kunz, 2012).

Active participating in an online brand community removes the physical as well as the temporal barriers,

which increase the likelihood of consumers to participate in the online community (Davis et al., 2014).

Secondly, it also improves the WOM communication, which is a powerful tool for marketing since

WOM has made an exponential increase in volume on SM platforms (Cvijikj & Michahelles, 2013).

More specific, we are talking about Electronic WOM which is coming from the online communication

between the consumers who are interacting on SM posts coming from brands. (Kremers, n.d.; Tafesse,

2015). Furthermore, people want to become a member of an online community to increase satisfaction

within the community and to increase their personal degree of influence to other people of the

community (Jahn & Kunz, 2012). Marketeers are capitalizing the trends of brand communities on SM

by increasing CE and generating WOM which result in richer information sharing and understanding

better the drivers of sales (Goh et al., 2013).

A study from Davis et al. (2014) identified five core elements that drive brand consumption in a SM

community, which can be used as opportunities to increase the activity of a beneficial online community.

(1) Functional brand consumption (e.g. problem solving, information searching, evaluate services etc.),

(2) Emotional brand consumption (e.g. enjoyable interactions, feeling privileged or recognized,

satisfaction etc.), (3) Self-oriented brand consumption (e.g. self-actualization, perception and branding),

(4) Social brand consumption (e.g. social interaction, community attachment, experience exchange etc.)

and (5) Relational brand consumption (e.g. desire to know the people behind the brand and to get

personalized interaction and co-creation of the service offered). Each of these 5 drivers is an interesting

opportunity to enhance and increase the relationship with their consumers. Brands should pay attention

to these drivers to get value out of the interactivity with the customers.

Over the last decades, different studies have tried to classify BP’s to make efficient use of a company

message strategy. A message strategy is primary tactic to deliver the key message. It aligns the content

of the brand with the needs of the consumers. Furthermore, it tries to bridge the gap between what

consumers need to hear and what marketeers want to say (Tafesse & Wien, 2017). Brands even use SM

to get more interaction with their customers by introducing new products online, sharing brand related

information or even announcing free giveaways (Newman, 2012). Companies want to increase their

consumer engagement through a well-implemented SMS.

9

Engagement means interacting and cooperating with community members (Cvijikj & Michahelles,

2013). Another definition describes engagement “as a consumer relationship that recognizes that people

are inherently social and look to create and maintain relations not only with other people, but also with

brands.” (Ashley & Tuten, 2015, p. 17). More specific, we will focus on CE which “entails the

customer’s interactive experience with the brand, is context-dependent and enhances consumers”

(Gummerus et al., 2012, p. 859). Marketeers who adapt the engagement perspective are shifting the

focus from a transactional relationship to an interactional perspective. Applied on SM, it means clicking,

sharing or committing on a BP. Companies who understand well the characteristics who influence the

level of CE and apply them well to increase their volume of WOM, are a step ahead of brand awareness.

And this may result in higher revenue. Secondly, CE is in a positive relationship with loyalty and

satisfaction. People who are satisfied about the products of the brand they prefer, are more likely to join

a brand community (Gummerus et al., 2012)

A study from Lee, Hosanagar, & Nair (2018) went even deeper into it, shifting the focus from SMM to

content marketing whith specific focus on developing content that increases engagement. Content has

become more important in SMS because Facebook posts need to be short and to the point and user

engagement is measured daily. Besides, every day, SM data from companies are becoming larger which

makes it even more important. Because of the exponential increase of SM sites, a new marketing

approach was even born, “viral marketing”, which is the spread of the original message of the brand

through consumer interaction. Nowadays, companies make efficient use of this technique to increase

brand image promotion by focusing on those characteristics that have a higher degree of spread (Shen

& Bissell, 2013). Even in the tourism industry, SM plays an important role. Strategies that are aligned

with SM help destinations to remain competitive (Kiráľová & Pavlíčeka, 2015). SM can increase brand

awareness, brand engagement and a WOM by implementing a well-developed communication strategy.

It even enables visitors to communicate with each other by sharing their opinion about recent experience.

Publishing posts on SM can be used for advertising their brand or specific product or services.

2.3 Brand fan pages and posts Brands connect with their customers and fans by sending messages on a regular basis to the world. These

BP’s appear in consumers newsfeeds whether they like the specific brand fan page or not. Companies

can also make use of sponsored posts to increase the reach of the message (Tafesse, 2015). BP’s are a

rich form of communication and serve as different goals depending on the meaning of the message.

They enrich the relationship between the brand and the customer and provide information to the

followers (de Vries et al., 2012). BP’s have also the ability to support multiple media types (e.g. photos,

text, links, videos, quizzes etc.). They strengthen the brand relationship with its customers. Fans who

follow a brand page (like page) do not only comment/like brand’s regular post, they also interact with

other consumers by liking and reacting on other comments. Automatic response options (e.g. likes &

10

shares) allow consumers to instant interactive response without needing to put a lot of effort in it (Tafesse

& Wien, 2017). A brand fan page is in the first place one of the main connections between the consumers

(followers) and the brand (Jahn & Kunz, 2012). It empowers consumers to leave their opinions and

express their feelings which all contribute to the overall richness of the brand (Tafesse & Wien, 2017).

Secondly, it helps companies to communicate on a global level and to do marketing at a personal level

(Cvijikj & Michahelles, 2013). Thirdly, BP’s are a goldmine of information which can deliver social

benefits for its followers (de Vries et al., 2012). It is a useful tool to deepen the relationship between the

brand and a consumer (Jahn & Kunz, 2012). Besides liking, sharing or commenting on a BP, followers

can also send private messages to brand pages for specific personalized questions which support even

more the customer-brand relationship (Tafesse & Wien, 2017). BP’s can have one or more moderators,

who are the owners of like pages and control the page. BP’s can have any number of members, who are

also known as followers. In fact, fans can engage with a company brand page by (1) liking existing posts

by the company posted, (2) sharing posts on their own wall page, (3) posting content on the company’s

wall, and (4) leaving a comment on a Facebook post (Cvijikj & Michahelles, 2013). All these actions

contribute to the implementation of a good SMS through WOM communications.

3 Classification review Previous literature has studied different models of content classification frameworks together with their

relationship between content and BPP (Cvijikj & Michahelles, 2013; de Vries et al., 2012; Swani et al.,

2017), audience response (Tafesse, 2015), CE (Lee et al., 2018) or brand loyalty (Shen & Bissell, 2013).

Besides these studies use different names as a dependent variable (DV), most of them use the same

variables as a measure of their DV (number of likes, comments & shares). Other studies have focused

more on the qualitative sight of research through a survey (Gummerus et al., 2012; Jahn & Kunz, 2012).

As mentioned before, previous classification frameworks are most of the time at random chosen whether

the study is based on predictive or prescriptive research. This section gives you an overview of which

classification approaches have been used in previous research and where the categorization is coming

from, meaning which viewpoint and concepts it takes into account.

First, this section gives you a general overview of previous literature which focused on content

classification of (Social Media) messages. Second, we will give you a first briefly overview of the

different (content) classification approaches that have been found in literature. In the following sections,

a deeper analysis of the different approaches will be given, followed by also taking a first look on how

media elements and valance have been used in previous content literature.

11

3.1 General overview Exhibit 1 provides a representative overview of literature which has focused on the classification of

different SM posts. For each literature, we checked whether the following characteristics are involved

in the specific research. (1) Media elements: does the framework take photos, links, videos etc. into

account? (2) Main focus: is the main focus of the study on the content itself (ex. Angry, valence, sad

etc.) or is the focus on the types of content? (ex. Entertainment, information, transaction etc.). (3)

Learning approach: is the classification of the content messages based on a supervised (SL) or on an

unsupervised learning (UL) approach? (4) Dependent variable: does the DV focus on engagement (e.g.

number of likes, reactions etc.) or on sales (e.g. repeating purchase behaviour). For each study the

research method (predictive / descriptive/ exploratory...), industry and data source are also given.

Followed by how the classification framework looks like, to end with some extra information of the

framework.

Facebook has been the most used data source for content analysis. However, Twitter has become more

popular in later SM content studies. There has been a main focus on content classification through SL

as well as on engagement as a DV. Some papers have focused more on the possible types of contents

while other papers really have focused on what sentiment we can find in the content (content itself).

Most of the first researches, which analysed the relationship between Facebook posts and BPP, also took

media elements into account. Over the last years there has been a shift to studies who are focussing more

on the content of posts. However, there is still a large potential for further research on content analysis.

First of all, only a few studies have conducted an UL approach (Netzer et al., 2012; Zhang, Moe, &

Schweidel, 2017). Secondly, online engagement has been studied widely over different types of SM, but

there is a limitation of knowledge of studies that have analysed the relationship between content and the

sales of company (Goh et al., 2013; Rishika, Kumar, Janakiraman, & Bezawada, 2013). Thirdly, to our

knowledge, no study has compared different framework approaches of content classification to check if

there is a “best” model for content classification. This research is the first study that comes with a

classification literature review and comes up with different preferable distinctive classification

approaches which can be used as a starting point for further content research in SM. Table 2 gives an

overview of the different classification approaches.

Table 2: Classification Approaches

Approach

(0) Base Content Approach (BCA) Information Entertainment/transaction(1) Content Approach (CA) Information Entertainment Transaction(2) Message Stragegy Approach (MSA) Functional Experiential Emotional Brand resonance(3) Marketeer's Orientation Approach (MOA) Task-oriented Relationship/Interaction-oriented Self-oriented(4) Viral Marketing Rules Approach (VMRA) Promotion Product Entertainment EventUnsupervised Approach (UA) *Media Type Approach (MTA) Interactivity Vividness

(0,1,2,3 & 4) have a pre-classified framework (Supervised Approach)* Unsupervised Approach has no pre-classification framework (e.g. Factor Analysis or Topic Analysis)

Classification

12

Different ad hoc approaches can be found based on previous literature. Tafesse & Wien (2017) classified

previous literature into 3 main ad hoc approaches. We have updated & analysed these 3 main approaches

and added a 4th approach which is worth mentioning. All of these 4 classification approaches made use

of SL to classify the different messages in the right category. The qualitative research from Tafesse &

Wien (2017) has only focused on specific possibilities of classification. This study gives us already a

good insight of a “best” model of content classification and different approaches that have been used in

the past, although it did not take the impact of the different variables on CE into account. Still, this

research was the first to our knowledge who compared different classification frameworks who have

been used in previous literature and came up with a categorization of 12 exhaustive and mutually

exclusive categories of BP’s. Firstly, the framework can be used on a daily base for marketeers to inspire

new BP’s. Secondly, it can be used to tune-up a company content strategy. For example, promotional

BP’s can be used to stimulate sales or customer relationship posts can be used to build a brand

community. Although Tafesse & Wien (2017) mentioned already the 3 different approaches in their

research, they did not go deeper into the different approaches and analysed exactly how these

frameworks are derived from previous studies. Our research will take this limitation into account and

will look at the source of the different approaches. The research from Tafesse & Wien (2017) already

gave us a better understanding of different content approaches which have been used. In addition, it

helps us to distinct different content variables and to understand them better, which we will use in our

proposed frameworks.

3.1.1 (Base) Content Approach The first approach tries to differentiate posts based on entertainment, information & transaction (Cvijikj

& Michahelles, 2013; de Vries et al., 2012). However, this type of classification is ineffective against

some type of posts. Where should post about brand resonance (posts about the identity of the brand) or

relationship-oriented posts (e.g. customer feedback, customer testimony, Q&A) be classified? Tafesse

(2015) took this limitation into his research and classified these 3 variables into one group variable

“content-type”. Cvijikj & Michahelles (2013) also took posting time & media type into their model,

which limits its knowledge about the content of posts. Tafesse (2015) also took vividness, interactivity,

novelty, brand consistency into its model besides content type, which made his framework a large scope

compared to previous research. Our study will take this CA into account (information, entertainment

and transaction) and analyse how this framework has been used in previous literature. Secondly, our

study has come up with a BCA framework which will be used as a base categorization which consist

only out of 2 categorizations. This base framework consists out of information and entertainment/

transaction. Besides, previous research from Lee et al. (2018) already came up with a base model and

classified Facebook posts into brand personality-related and directly informative posts, this framework

still lacks in some specific type of posts. Consequently, we can say that this study came up with a new

base classification approach which is best suitable to our knowledge.

13

Although we call this approach the “content” approach, this does not mean that the other approaches did

not focus on the content of the messages. The different between the (Base) Content Approach and the

other classification approaches, is that the BCA and the CA take the content of the message as a starting

viewpoint to classify different messages, while the other approaches take another viewpoint than content

as a starting point (e.g. starting by looking at which message strategies are used, instead of looking at

which different type of content). Besides the different starting points, each message will still be

classified based on the content of the different messages.

3.1.2 Message Strategy Approach The second approach takes some traditional message strategies into account like functional, experiential,

emotional and brand resonance while ignoring several other message strategies (Ashley & Tuten, 2015;

Swani et al., 2014, 2017). It reviews which strategies can be used to bridge the gap between what a

marketeer want to say and what the consumer needs to hear. While these studies give us a good insight

on how different message strategies are applied to brands posts, they lack to consider what is really

stated in the post (content of the posts).

3.1.3 Marketeer’s Orientation Approach The third approach is derived from previous literature which focused on the consumers’ perceived

scheme (e.g. social, functional, and self-concept categories) (Davis et al., 2014; Jahn & Kunz, 2012).

These studies have put the attention on the subjective meaning of the consumers. Compared to these

studies, we will focus our attention on the marketeer’s perceptual feeling with customer’s engagement.

The study from Kim et al. (2015) already took this viewpoint into account and came up with the

following three types of orientation marketeers have when using SM. Through task-oriented

communication they want to achieve a goal (e.g. increase sales of the company). A second oriented

approach is based on the interaction. Marketeers that make use of this orientation want to increase the

relationship with its customer (e.g. increase customer primarily concerned with its own desires and needs

when interaction with others (e.g. increase brand awareness). Self-oriented messages focus on the

personal thoughts and feelings of the brand. Kim et al. (2015) analysed the impact of this approach on

the brands perceived intention to post messages online.

3.1.4 Viral Marketing Rules Approach A fourth approach and a new approach compared to Tafesse & Wien (2017) classifies posts into event,

product, promotion and entertainment (Shen & Bissell, 2013). Compared to other approaches, this

framework focuses on the viral marketing rules (increase awareness of a specific post). Even this

classification framework is a good approach on its own, it has too much overlap with the 3 previous

approaches. So we will not take this model into our methodology of our study which will be further

explained. Still, it is worth mentioning how this approach looks like for companies who want to classify

posts specifically based on the viral marketing rules.

14

3.1.5 Unsupervised Approach Unsupervised classifications have been less used compared to supervised classification approaches. It

tries to get meaningful insights & information out of unstructured data. Zhao, Jiang, Weng, He, & Lim

(2011) came up with the following topics: arts, business, education, style, tech-science and world

specific for data coming from New York Times articles and arts, business, family & life and twitter

specific to data coming from twitter. Netzer et al. (2012) found 3 topics (school, finance and politics)

coming from business school’s data. It is clear that topics and themes are inherent to the characters of

data.

3.1.6 Media Type Approach In the content analysis literature of SM posts, media type elements have been commonly added to the

framework, next to the other supervised classification variables. Previous research has looked at how

media elements (videos, pictures, links, url’s etc.) have an impact on online engagement of brand pages.

Media types can be classified into two categories. Vividness is related to in which extend a specific type

of media stimulates one out of our five senses (Cvijikj & Michahelles, 2013; L. de Vries et al., 2012).

Interactivity focus is on the interaction between two parties, and it looks at how interactive the post is

coming to the brand followers (L. de Vries et al., 2012; Tafesse, 2015)

The exponential increase of SM has resulted in an increasing amount of studies focusing on content

classification of SM posts. This has yield in too many different classification frameworks. Researchers

have been making use of an own preferred classification approach without looking at how it had been

done in previous literature. By giving an overview of the different literatures in Exhibit 1, accomplished

with their own capabilities, we have tried to give a best summarization of previous literature which

focused on message content classification. Different frameworks have emerged over the last years, all

with their own viewpoint and characteristics. Although the classification of posts into a specific

framework remains quite subjective, our research tries to categorize the different approaches used into

one of the specified classification frameworks (BCA, CA, MSS & MOA). We also added an extra

supervised approach (Viral Marketing Rules Approach). As mentioned before, we will not take this

approach into our analysis since it has too much overlap with the other supervised approaches, we will

only look at how this approach is derived compared to previous literature.

In the next sections, we will go deeper into each of the established approaches and look how each

framework is related to previous literature. In addition, a deeper meaning of each classification variable

is given, ensembled with a literature review of how the classification variables of each approach is

related to BPP. Furthermore, we will also look at how unsupervised approaches, media elements and

valance have been used in SM literature.

15

4 (Base) Content Approach 4.1 Literature review Based on the review of previous classification literature, we have identified one base (main) content

classification framework that consists out of information and entertainment/transaction. The other

content approach splits entertainment and transaction into two separated content variables. Compared

to the other classification approaches, these two models look at what content is provided by the messages

as a starting point. A literature review of the base content approach and the content approach can be

found in Table 3.

Research from Lee et al. (2018) already came up with a standard two feature classification framework.

Brand personality-related variables which consist out of emotions, humour, small talks etc. and directly

informative variables which consist out of mentioning deals, price or products in the BP. Although this

is a suitable framework, it has some limitations. First of all, the framework from Lee et al. (2018) does

not make a distinction between informational posts and transactional posts. Posts who want to stimulate

transactions between the company and the consumers by mentioning deals, sweepstakes or price

discount, are all stored under the informational posts. Secondly, brand personality-related posts take the

emotional side of posts more into account, which is more related to how it is being said. As mentioned

before, our classification approaches focus on what is being said by looking what content is provided

in the messages. That’s why we have come up with a new base classification approach who classifies

posts into information and entertainment/transaction. This classification is aligned with previous

advertising applications of banners. Still, the research from Lee et al. (2018) assisted to our research in

setting up the different meanings of the categories of the approaches used in our study.

Table 3: Literature Review: (Base) Content Approach

The study from de Vries et al. (2012) has analysed the impact of different post types on BPP. The

framework consisted among other features out of informational content and entertainment content,

which are posts that are perceived as fun and excited to read (Figure 3). Still, this framework lacked the

Entertainment Transaction

Cvijikj & Michahelles (2013) information entertainment remunerationTafesse (2015) informational content entertaining content transactional content

Lee et al. (2018)directly informative (brand mention, price, product mention)

brand personality-related (holiday mention and humor used)

directly informative (deals, price compare, discounts etc.)

de Vries et al. (2012) informational content entertaining content /Stephen et al. (2015) information arousal-oriented /Swani et al. (2014) information search / selling strategy (calls to purchase)Swani et al. (2017) information search / selling strategy (calls to purchase)Goh et al. (2013) content information richness / /Gummerus et al. (2012) / entertainment /Setty et al. (2014) / life events and entertainment posts /

Entertainment/transaction Authors Informational

16

possibility to classify posts about transactions or remuneration-based posts (sweepstakes, deals, bonuses

etc.). Besides this is quite a good first approach of content classification, the research did not mention

on which classification theories or previous frameworks their model was based on. It looks like the

categorization was rather a first group guess of a possible framework.

Figure 3: Conceptual Framework (de Vries et al., 2012)

A later study from Cvijikj & Michahelles (2013) has expanded the research from de Vries et al. (2012).

Taking not only the content of a BP into account, but also the time when the content should be posted

(which was a control variable in the study from de Vries et al.) (Figure 4). Secondly it added a third

variable (remuneration) to the content type category. A positive update compared to the study from de

Vries et al. (2012), is that the new framework is based on the UGT by looking at what factors drive or

motivate consumers for online engagement instead of an “at random” chosen categorization. A first look

at how these variables are related to online engagement showed that entertainment posts have the highest

level of engagement. In addition, information related posts increase the number of likes and comments.

To increase the number of comments, moderator’s shoulder make use of remuneration posts.

Figure 4: Conceptual Framework (Cvijikj & Michahelles, 2013)

17

Still we find the third category remuneration not straight forward. Is this also applicable for posts related

to loyalty programs or links for payment or does it only focus on sweepstakes? That’s why we have

changed this variable to transaction which has as a broader scope than only remuneration. Besides price

promotions, sweepstakes & loyalty (remuneration), it contributes to everything that is aiming to make a

transaction between the consumer and the brand. The study from Tafesse (2015) also made use of this

category on analysing how these different types of content influence customers responses on Facebook

posts.

Previous research from Stephen, Sciandra, & Inman (2015) has a broader range of content characteristics

compared to the study from Lee et al. (2018) and focuses on what branded content says (information &

calls to action) or how it is said (arousal- & persuasion-orientated). Two out of the four content

characteristics are aligned to our CA. First of all, the arousal-oriented content characteristic tries to affect

positive responses from consumer (positivity and humour). These characteristics are most aligned with

the entertainment category of our approach. The main difference is that the arousal characteristics looks

more at how it is said, while the entertainment characteristic looks at what is said. Secondly, the

information content characteristic refers to how much the post is associated with informational cues. It

focuses on product-related information, value-related information (value- or price-related information)

or brand-related. This category is well aligned to the information category of our content framework.

Despite the fact that this classification is a good standard approach, this framework will be less effective

against some message strategies (e.g. Where should we classify posts about brand resonance or social

cause?) (Tafesse & Wien, 2017). To overcome this limitation, we will also take the Message Strategy

Approach into our study.

4.2 Classification variables For each categorization approach, we will also give an overview of common message themes used for

each variable of the accompanying classification approach. It gives a good and quick understanding of

the categorization framework. Table 4 gives an overview of the common subjects for entertainment,

information and transaction.

Table 4: Common message themes of each classification variable: (Base) Content Approach

Variable

Entertainment Funny, humorous, humorous items, artistic works, events etc.

Information Product specifications, product reviews, product recommendations etc.

Transaction Sweepstakes, deals, bonuses, promotions, discounts, loyalty programs, links for payment etc.

Common message themes

18

4.2.1 Entertainment Entertainment is “the act of providing or being provided with amusement or enjoyment” (‘Definition of

entertainment’, n.d.). Posts with an entertainment characteristic are perceived to be fun, exciting and

cool. These kinds of posts are most of the time unrelated to the brand or a product (e.g. anecdotes,

slogan, word play, humours items, artistic works). It encourages people to contribute to the content

(Cvijikj & Michahelles, 2013; de Vries et al., 2012; Tafesse, 2015). It stimulates direct interaction

between the brand and its consumers (e.g. Q&A, survey) (Shen & Bissell, 2013). Common subjects are

movies, TV shows, series, shows etc. (Lee et al., 2018). An example of this specific type of post is

“What a lovely day, what are your plans today?”

Previous research from de Vries et al. (2012) found that entertainment posts have a negative impact on

the number of likes. This could be due to the fact that this information is unrelated to the brand and

consumers are not interested in it. But later research from Cvijikj & Michahelles (2013) found that

entertainment posts have a positive impact on the like and comments ratio compared to non-

entertainment posts. It also had the strongest impact compared to information and remuneration content

type. Brand entertainment content posts have a positive impact on the number of likes from a BP

(Tafesse, 2015). Gummerus et al. (2012) found a significant positive correlation between customers

perceived benefits of entertainment with customer satisfaction. Stephen et al. (2015) found a positive

relationship with posts who are perceived to be funny or humorous. Surprising articles have a positive

relationship to make the NYT’s most e-mailed list (Berger & Milkman, 2012).

4.2.2 Information Information is another important character of BP’s. Informational content is about product

specifications, product reviews & product recommendations (Tafesse, 2015). A post which is rich of

information (e.g. launch of a new product, new industry segment) will increase a brand fans motivations

to online contribution. Furthermore, information was found to be one of the main factors for online CE

in the form of consumption and value creation (Cvijikj & Michahelles, 2013). Research showed a

positive attitude of consumers towards informational posts (de Vries et al., 2012). Stephen et al. (2015)

had a broader scope on his information related posts which was classified into product-related, value-

related & brand-related information, where value-related information are posts that mention value- or

price-related information such as discounts or promotions. This characteristic “value-related

information” refers more to our transaction variable while product-related information fits well here.

Information BP’s enrich the brand popularity based on the like and comment ratio. But the study from

de Vries et al. (2012) showed an inconclusive effect (no effect vs. positive effect). Research from Swani

et al. (2017) found a negative relationship between information search posts (messages contain cues and

links that aim for information search) and the number of likes and comments. Posts containing the

product price or price comparison have a negative relationship with BPP (Lee et al., 2018). Information

19

about the product availability has also a negative impact on the number or likes and information as to

obtain the product has a negative impact on the number of comment (Lee et al., 2018).

4.2.3 Transaction The third variable from the first supervised model is based on transaction. This characteristic includes

everything that is linked to remuneration (e.g. sweepstakes, bonuses, promotion deals, discounts, loyalty

programs. etc.). Moreover, it refers to posts that include direct links to order and pay for a product

(Tafesse, 2015). The major focus of this post is to end with a specific transaction between the company

and one or more consumers. It has a much broader scope than the promotion variable that we will use

in VMRA. Swani et al. (2017) found a negative relationship between direct-calls-to-purchase and BPP.

5 Message Strategy Approach 5.1.1 Literature review Model 2 classifies posts into functional, experiential, emotional and brand resonance. This

categorization focuses on some traditional message strategies compared to the content focus from

previous approach (Tafesse & Wien, 2017). Still, while having another viewpoint on classification, we

distinguish the different posts on the content they provide. The framework is derived from the study

from Ashley & Tuten (2015) which conducted a content analysis of the creative (message) strategies in

SMA. It investigated which type of message companies are posting (what is their SMS?) and how these

channels and strategies are related to maximize social engagement with its consumers. More valuable

to our research, it came up with a categorization of the top creative strategies. Functional appeals are the

most common used message strategy, followed by resonance and experiential appeals. An overview of

the most common message strategies used can be found in Table 6. The category resonance from Ashley

& Tuten (2015) (which focuses on the interaction between image and words) is, in our opinion, too

vague to apply into practice. This explains why we use brand resonance as a 4th category of the MSA.

This category focuses on everything which is based on the image of the brand. It has a broader and more

defined scope which will be further explained in the next sections.

Table 5: Literature Review: Message Strategy Approach

Authors Functional Experiential Emotional Brand Resonance

Tafesse & Wien (2017) functional brand posts experiential brand posts emotional brand posts brand resonance

Ashley & Tuten (2015) functional appeals experiential appeals emotional appeals resonanceJahn & Kunz (2012) functional value hedonic value / /De Vries & Carlson (2014) functional value hedonic value / /Swani et al. (2014) functional appeals / emotional appeals brand strategy (corporate brand name &

product brand name)Swani et al. (2017) functional appeals / emotional appeals brand cue (corporate name & product name)

Davis et al. (2014) functional brand consumption / emotional brand consumption /Lee et al. (2018) / / brand personality-relatd (emotion &

emoticon)directly informative (brand mention)

Berger & Milkman (2012) / / Emotions /Tafesse (2015) / / / brand post consistency

20

Swani et al. (2014;2017) analysed how several message strategies differentiate between a B2C and a

B2B environment. Their framework consisted out of four different message strategy viewpoints: brand

strategy, message appeals, selling strategy and information search. A difference was made between

functional appeals and emotional appeals. While functional appeals refer to specific product

specifications, emotional appeals want to invoke emotions of the consumers. The brand strategy

viewpoint (which looks at the differences between mentioning the corporate brand name or mentioning

the product brand name) is most in line with the brand resonance category of or viewpoint. The main

difference is that brand resonance has a broader view than only looking at “names” but also takes history

of the brand or slogan into account.

Table 6: Message Strategy Usage (Ashley & Tuten, 2015)

Tafesse & Wien (2017) provided a framework which consisted out of 12 categories of BP’s. The 4

categories are also part of this framework. The main difference is that our study focuses on different

classification approaches (viewpoints), while the main focus of the study from Tafesse & Wien (2017)

was on building one comprehensive framework. Jahn & Kunz (2012) classified its “content” category

further into functional value and hedonic value. We do not find the overall theme “content” well suitable

for functional and hedonic since these two variables are more suitable for our MSA. Furthermore, the

other two category groups of their study (self-oriented and relationship-oriented) will be used in our

next Marketeer’s Orientation Approach. So that’s why we have adapted the content category of Jahn &

Kunz (2012) to this model and not to the previous Content Approach. Davis et al. (2014) identified five

core drivers that represent consumers’ motivation for brand consumption in a SM community. The

model consists out of functional and emotional brand consumption (which are applicable to our MSA),

self-oriented and relationship brand consumption (which are relevant for the MOA) and social brand

consumption. The study from Davis et al. (2014) will also help us to better understand the different

categories.

21

5.2 Classification variables Table 7: Common message themes of each classification variable: Message Strategy Approach

5.2.1 Functional Functional BP’s are posts that highlight the functional attributes of a company’s products and services.

These kind of posts focuses on promoting the benefits of company products and services according to

performance, quality, affordability, efficiency, design & style criteria (Tafesse & Wien, 2017).

Functional BP’s can have an internal or external orientation. Internal-oriented functional posts focus on

product attributes and benefits which they have claimed by themselves. An example “We would like to

introduce ourselves to our new computers, which have a higher processor than you could ever dream

of!” External-oriented functional posts are benefits claimed by external reviewers, which the company

would like to share with their consumers. Davis et al. (2014) found that the main drivers for consumers’

functional brand consumption are to solve problems, send specific inquiries, search for information,

evaluate service before purchasing and gain access to specific deals. De Vries & Carlson (2014) found

that a functional value of the BFP positively influences the intensity of using the BFP. Consumers were

more likely to interact online when they perceive the information as usual. Ashley & Tuten (2015),

which focused on message strategies, found that the correlation between functional appeals and

engagement score was insignificant. He defined a functional appeal as the utility or functionality of the

product or service. The research from Swani et al. (2017) found a positive (but very small) relationship

between functional appeal posts and likes but a negative relationship with the number of comments.

This kind of posts create less emotional pulses to react on the post.

5.2.2 Experiential Experiential BP’s “evoke consumers’ sensory and behavioural responses. They highlight the sensory

and embodied qualities of the brand and often associate the brand with pleasurable consumer

experiences.” (Tafesse & Wien, 2017). They are further classified into 3 subcategories. (1) Sensory

stimulation, which focuses mostly on the 5 senses. (e.g. visual, taste, odour etc.). (2) Physical stimulation

employs behavioural brand cues to amplify the physical qualities of the brand. A good example is when

Toyota posted a video of their new model combining it with footages from extreme sportsmen, enabling

Variable

Functional

Experiential

Emotional Emotion-laden language (sentiment analysis)

Brand Resonance


Product & service functional claims, product reviews, awards, green credentials etc.

Sensory stimulation (e.g. visual, auditory, taste, odour etc.), physical stimulation (e.g. physical actions, performances, activities etc.) & brand events (product launches, festivals, fan events, sponsored events) etc.

Brand image (e.g. brand logo, brand slogan, brand character), photos of branded products, celebrity association, brand history etc.

22

the good physical quality of the car. (3) Brand events can be events about product launches, fan events,

sport events etc. (Tafesse & Wien, 2017). Ashley & Tuten (2015) found a positive significant correlation

between experiential appeals (e.g. how the customer experience is concerned about the sight, sound,

taste, touch or smell) and engagement score.

5.2.3 Emotional Emotional BP’s want to evoke consumer emotions. Most of the time they make use of emotion-laden

language, which encourages positive or negative feelings towards the consumers. An example: “I have

a terrifying bad day!” The words “terrifying” derived from “terrify” and “bad” are both negative

emotionally charged (Tafesse & Wien, 2017). Besides emoting-laden language, Tafesse & Wien (2017)

also referred to emotional storytelling and humour related posts as emotional posts. One of the 5 drivers

from Davis et al. (2014) for connecting to a brand is emotional brand consumption, which focuses on

enjoyable interactions (e.g. feeling privileged, recognized by the brand or satisfaction of curiosity).

Ashley & Tuten (2015) found a negative correlation between the Engagement Score and emotional

appeals which mainly focused on how the customer will feel it. Research from de Vries et al. (2012)

concluded that the share of positive comments has a positive effect on BPP (likes & comments) while

the share of negative comments only has a positive effect on comments (so not on likes). This is possibly

due the fact that people want to confirm other people’s opinions or disagree and counter-react to

someone’s opinion rather than liking the post. Berger & Milkman (2012) found that articles that evoke

emotions such as awe, anger, anxiety or sadness are more likely to become viral than non-emotional

articles. We may say that articles with positive or negative content go more viral. Swani et al. (2017)

found that posts containing emotional appeals have a positive impact on BPP compared to non-

emotional appeals. A possible explanation for the negative correlation between emotional appeal and

the engagement score could stem from the fact that this engagement score is coming from Engagement

dB, which is something different than BPP (likes & comments). Lee et al. (2018) found a positive effect

of posts that represent emotions on CE (likes & comments). So, if a BP is emotional, the motivation of

a fan to participate on the content for a brand is met.

5.2.4 Brand Resonance Brand resonance posts “are posts that direct attention to the brand promise and identify of the focal

brand” (Tafesse & Wien, 2017, p.9). The main focus is on brand image, brand personality, brand

association and branded products with the main goal to influence consumers’ brand attitude. Brand

image posts include the brand slogan, logo, brand name, aesthetic features, values or characteristics

(Tafesse, 2015). Red Bull, for example, utilize their campaign slogan “Gives you wings” a lot in their

posts. The second variant shows photos of branded products. BMW that posted a close-up of their new

model is an example of this. The third approach includes posts involving celebrities and influencers.

When we think about Nespresso, we immediately link it to George Clooney. Nespresso uses this

association a lot when posting new feeds. A last possible variant involves post about the brand history

23

(Tafesse & Wien, 2017). Messages containing their corporate brand name have a positive relationship

with the number of comments but a negative relationship with the number of likes (Swani et al., 2017).

Lee et al. (2018) confirmed the negative relationship between post containing specific brand or

organization name and the number of likes but also found a negative relationship with the number of

comments. Tafesse (2015) found a positive relationship between BP consistency and audience response

(number of likes & shares of a BP). Tafesse (2015) referred to brand consistency as developing a uniform

organisational identity with a consistent brand position by making use of its brands name, logo, slogan,

values & aesthetic features in the BP’s. So brand consistency and brand resonance can be seen as

synonyms, while posts who contain their brand name are only a small sub division of the brand

resonance category.

6 Marketeer’s Orientation Approach 6.1 Literature review The third approach classifies BP’s in task-oriented content posts, relationship-oriented content posts

(social & brand interaction) & self-oriented (self-concept) content posts. This classification approach is

derived from the research from Kim et al. (2015), which focused on the marketeer’s perception of

customer’s engagement. The categorization is based on the salesmanship literature. It looks at which

different orientation viewpoints of communication a salesperson can take. Firstly, a salesperson can

make use of task-oriented communication which is highly goal-oriented. Secondly, salespersons can

focus on socializing and building personal relationships through interaction-oriented communication.

Thirdly, salespersons who use self-oriented communication make use of personal attributes or

experiences while communicating with others. Kim et al. (2015) adopted this salesmanship viewpoint

to his marketing viewpoint on social BP’s.

Table 8: Literature Review: Marketeer's Orientation Approach

This approach is similar with previous research that focused on consumers’ perceived scheme through

the U&G theory which classified BP’s in content-oriented posts (functional & hedonic: fun &

enjoyment), relationship-oriented posts (social & brand interactivity) & self-oriented posts (self-

concept) (Jahn & Kunz, 2012; Tafesse & Wien, 2017). We have not adopted the first content-oriented

category to our MOA for several reasons. In the first place, we think that the CA is a valuable approach

Authors Task-oriented Relationship/interaction-oriented Self-oriented

Kim et al. (2015) task-oriented interaction-oriented self-orientedJahn & Kunz (2012) / relationship-oriented (soial interaction value & brand

interaction value)self-oriented (self-concept value)

Davis et al. (2014) / relational brand consumption self-oriented brand consumptionSwani et al. (2017) / customer relationship /Ashley & Tuten (2015) / interactivity /Stephen et al. (2015) / calls to action /de Vries et al. (2012) / social value & co-creation value /

24

on its own as we have taken this into our study as a separate approach. Secondly, Jahn & Kunz (2012)

divided the content-oriented viewpoint into functional and hedonic value. According to our opinion, the

functional category is more suitable in the MSA compared to the MOA. Thirdly, the category hedonic

value is associated with a consumer’s perceived fun, pleasure and entertainment, which is more suitable

for the experiential category of the MSA. We can make the same conclusion for the study from de Vries

& Carlson (2014) who made use of the same classification from Jahn & Kunz (2012) and adjusted it a

little bit. They also took functional & hedonic value into their framework but did not group them together

as content-oriented. So, there is too much overlap of the content-oriented category with other

approaches. That’s why we have adopted the framework from Kim et al. (2015) into our study. Our

model will focus on what content drives marketeers to increase BPP. Another remark worth mentioning

is that the study from de Vries & Carlson (2014) and Jahn & Kunz (2012) were based out of qualitative

survey, so they did not really look at what content is stated in the post. Although the study from Tafesse

& Wien (2017) linked the study from Gummerus et al. (2012) to this approach, we would rather not link

them to each other. First of all, the research from Gummerus et al. (2012) studied the effects of

behaviours on perceived benefits and outcomes. So, it does not look at the content of the social posts

since it is a qualitative research. Secondly, the perceived benefits are social, entertainment and economic

which are not directly linked to the MOA. The entertainment perceived benefits would rather be linked

to the entertainment category of our CA.

6.2 Classification variables Table 9: Common message themes of each classification variable: Marketeer's Orientation Approach

6.2.1 Task-oriented The first characteristic that could be an explanation for marketeers’ motivation for online content

creation is based on a task-oriented viewpoint. Previous research focused more on the customer

perspective (de Vries & Carlson, 2014). Task-oriented posts want to increase sales or BPP through

traditional advertising. Advertising a certain brand or product through a persuasive message with

visuals, a new announcement about a product or service & online coupons, discounts, contests or

sweepstakes are some examples of task-oriented content. Task-oriented content was perceived to have

a significantly positive impact on the number of likes, comments and shares (Kim et al., 2015). It even

had a bigger impact on likes, comments & shares compared to interaction- and self-oriented content. In

Variable

Task-oriented

Relationship-oriented (interactivity)

Self-oriented


Customer feedback, links, voting, call to act, contest, quiz customer testimony, customer reviews, customer services, Q&A etc.

Friends, family, personal preferences, anecdotes and future plans etc.

Advertising, announcements new products or services, coupons, discounts, sweepstakes etc.

25

the qualitative research from Jahn & Kunz (2012), the content-oriented variable functional value was

significantly positive related to fan page usage intensity. Therefore, we suggest that task-oriented

content will have a positive relationship with BPP.

6.2.2 Relationship/Interaction-oriented Customer relationship can be defined as “posts that solicit information and feedback about customer

needs, expectations and experiences” (Tafesse & Wien, 2017, p.10). The main focus of relationship-

oriented posts is thus on social & brand interactivity (Jahn & Kunz, 2012). Tafesse (2015) used BP

interactivity in his model, which focuses on “the degree to which two or more communication parties

can act on each other” (Tafesse, 2015, p.931). Interaction-oriented content focuses on making the

relationship between customers and a brand stronger. Marketeers can post content about a personal

statement, a celebration, an opinion, the weather or entertainment. Furthermore, relationship posts can

ask for likes, comments or shares (Kim et al., 2015). Interactivity on SM is a two-way communication

between the brand and the consumers, as well as between the consumers themselves. Tafesse & Wien

(2017) further classified into 3 different categories. (1) Customers services posts which make common

service announcements and reminders. (2) Customer testimonials posts which highlight customer

previous success stories and (3) Customer feedback posts which ask through a Q&A for an opinion

about a brand product or services. Davis et al. (2014) described social brand consumption as the social

interaction between the consumers within a community (e.g. experience exchange, community

attachment, building links & social interaction) while relational brand consumption focuses on the

interaction between the brand and the consumers. (e.g. cocreation of services, desire to know the real

people behind the brand & the desire for personalized interaction with the brand). We incorporate these

two divisions into one relationship-oriented category. Ashley & Tuten (2015) described interactivity as

the degree to which consumers can actively participate and engage with the brand.

De Vries & Carlson (2014) found a positive effect of social interaction value & co-creation value of

brand posts on CE. Some previous research already analysed the impact of interactivity content on BPP

(Cvijikj & Michahelles, 2013; de Vries et al., 2012). A remark here is that these studies focused more

focused on specific elements that could appear in a post (Media Type Approach) (e.g. question mark,

photo, link etc.). Our research will have a broader view, focusing on the content itself. A question in a

post will have a higher degree of social interactivity since it encourages people to react on the post while

a link to another website will have a lower degree of interactivity (de Vries et al., 2012). Posts with a

higher degree of interactivity (e.g. contest or question) have a higher degree of enhancing BPP. An

exception is for questions which have a negative impact on likes since it encourages people to answer

on the question and not to like it. In addition, posts that containing a link have a negative effect on

comments since most of the time, people click on the link and do not come back to the specific BP’s.

This was also confirmed by Cvijikj & Michahelles (2013) where posts containing a picture or a status

had a positive effect on BPP compared to posts containing a link which has a higher factor of

26

interactivity. But our focus is on the content of the post and not on the media types of the posts. Posts

that do an effort to increase the customer-brand relationship through interactive communication will

have a higher intention to increase online engagement. Posts who are asking for engagement through

specific questions or requesting likes/comments/shares etc. have a positive impact on the number of

likes and comments (Stephen et al., 2015), while the study from Tafesse (2015) found a low negative

relationship with BP interactivity and the number of likes and shares of a BP. Interaction-oriented

content has a positive impact on the number of likes, comments and shares compared to non-interaction-

oriented posts based on the study from Kim et al. (2015). The qualitative research from Jahn & Kunz

(2012) found a positive relationship between the relationship-oriented viewpoint and the intensity to fan

page engagement.

6.2.3 Self-oriented Self-oriented content includes news, information or a story about the company or its products or an

event, program or campaign, which is sponsored by the company. It can also consist out of a media post

(video or picture) of its employees, management or staff (Kim et al., 2015). Our analysis focuses on the

viewpoint of the marketeers and not of the consumers, which gives to self-oriented content another

meaning. Still, previous research that has focused more on the consumers’ perceived intentions, helps

us to better understand the classification. Self-oriented posts (customer perspective) focus on individual

needs of an individual consumer. Jahn & Kunz (2012) concluded that there is a positive relationship

between this variable and fan page engagement. Tafesse & Wien (2017) defined personal BP’s as “posts

that center around consumers’ personal relationships, preference, and/or experience which can invoke

personally meaningful themes (family, friendship, personal anecdotes or future plans to initiate deeply

personal conversations with consumers)” (Tafesse & Wien, 2017, p.10). We will adapt the definition

from Tafesse & Wien (2017) which focused on the customer viewpoint to the marketeer’s viewpoint

since our research focuses on MGC. This has resulted in the following definition of self-oriented content.

Self-oriented content is content around the brand itself, preference, and/or experience with personal

themes (employees, staff, consumers’ relationship, management, company anecdotes or future plants).

It refers to marketeers who post about a company’s personal feeling, anecdotes or opinions. An example

of this post could be “Today, we are very happy to announce that our cousin will join our bartender’s

team!” Davis et al. (2014) classified self-oriented brand consumption further into self-actualization, self-

perception enhancement and self-branding. The study from Kim et al. (2015) found a positive

relationship between self-oriented content and the number of likes, comments & shares.

27

7 Viral Marketing Rules Approach 7.1 Literature review The VMRA categorizes posts in event, product, promotion and entertainment. This classification, based

on the viral marketing rules, is focusing on how we can increase the spread of a specific post which can

be used for different marketing objectives (e.g. product launch). An exploratory research from Shen &

Bissell (2013) made use of this classification, analysing the factors that influence brand loyalty in the

beauty industry. This approach is worth mentioning as a first evaluation. To our extend, this framework

is less suitable when you want to place it next to the previous mentioned approaches. First of all,

entertainment is also part of our content classification approach. Second, promotion is an example to the

transaction category of the CA which has a broader scope. Also, the scope of the product is too small to

our opinion. It can refer to information about a product (CA – information category) as well as to product

claims (functional - MSA). So, on behalf of the approaches we have stated and based on too much

overlap, this approach does not fit next to the other approaches. Still, this approach can be valuable on

its own for companies who want to classify their post on the viral marketing rules, ignoring the previous

approaches we have stated.

7.2 Classification variables 7.2.1 Event Table 10: Common message themes of each classification variable: Viral Marketing Rules Approach

A (current) event post “focuses on themes that capture active talking points that target audience, such

as cultural events, holidays, anniversaries, and the weather/season” (Tafesse & Wien, 2017, p.10).

Cultural events can include topics like TV Shows, film releases, sport competitions etc. Shen & Bissell’s

(2013) focus was on the sharing of a calendar, which has a broader viewpoint compared to the study

from Tafesse & Wien (2017) who did not incorporate brand events in their event category. They further

classified it in 4 time-oriented subcategories. An event from the past, today, tomorrow or in the future

can be shared. An example: “Tomorrow everybody is welcome to our annual university drink!” In our

research we will focus on the broad concept of an event, taking also brand events into the event category.

Variable Common message themes

Event Brand events (e.g. product, launches, festivals, fan events, sponsored events etc.), cultural events (e.g. sport, film, TV shows), holidays, special days (e.g. anniversary) & weather.

Product Product launch, reviews, opinions, tips.

Promotion Price discounts, coupons, discount code, giveaways, customer contests, product competitions, sample, gift with purchase.

Entertainment Funny, humorous, humorous items, artistic works, events

28

Stephen et al. (2015) found a negative relationship between posts who refer to a major or minor holiday

and the number of comments, but a positive relationship with the number of likes. A remarkable

conclusion from the study from Lee et al. (2018) is posts that mention holidays have a big negative

impact on BPP. Our event category will be much broader than only mentioning holidays. But we agree

that event posts will most of the time create happiness to people, since they can become excited for a

specific event or anniversary.

7.2.2 Product Posts categorized as product contain product-related information about a product launch or extension,

reviews, benefits, uses (how & when) or tips (Shen & Bissell, 2013; Stephen et al., 2015). An example:

“Our new model X is twice as fast compared to our previous model Y!” Stephen et al. (2015) found a

positive relationship between product posts & the number of likes & comments. Followers who receive

a message containing a product brand name are less willing to like or comment on this type of post

compared to posts who do not include the product brand name (Lee et al., 2018; Swani et al., 2017).

These studies are from later dates compared to the study from Stephen et al. (2015) who found a positive

relationship with BPP.

7.2.3 Promotion Promotion posts are based on stimulating the demand of the consumers, seduce them to take actions

towards a buying decision (e.g. giveaway, coupon/discount code, sample/gift with purchase, comparison

to competition) (Shen & Bissell, 2013; Tafesse & Wien, 2017). Sometimes these posts are equipped

with links to direct pages where they can make use of promotional offers or sign into a competition

(Ashley & Tuten, 2015). An example of a promotion post can be “tag your best friend and win a free

dinner for 2 persons!” Previous research from Cvijikj & Michahelles (2013) found that remuneration

posts have a negative impact on the like ratio but a positive impact on the comment ratio. This could be

due to the fact that if you specifically ask your followers to tag someone in the comment, people will

not like the post. Another possibility is that when the winner of a contest has been announced, the post

becomes irrelevant. Posts containing deals (discounts and freebies) have a negative relationship with

BPP (Lee et al., 2018). Stephen et al. (2015) also found that posts which contain value information (e.g.

pricing, discounts, coupons) & posts who ask for entering a competition through sweepstakes or

giveaways) have a negative impact on the number of likes & comments.

7.2.4 Entertainment We refer to model 1 entertainment, which used the same variable “entertainment” in the model.

29

8 Unsupervised Approach Previous mentioned approaches made use of a SL method to classify messages into the right category

(Ashley & Tuten, 2015; Berger & Milkman, 2012; Lee et al., 2018; Stephen et al., 2015; Swani et al.,

2014). It is an expensive, time-consuming approach but the performance will be higher compared to

UL. On the other hand, UL tries to get meaningful insights out of unstructured text data without human

involvement to label the variables lower (Netzer et al., 2012; Zhang et al., 2017). Secondly, UL is quite

new in SM content analysis, while supervised learning has been used commonly. Zhao et al. (2001)

looked at which different topics appeared on Twitter compared to the New York Times (NYT) without

checking the relationship with engagement. The study from Netzer et al. (2012) focused on

understanding the large consumer generated data through text-mining analysis and a network analysis

framework. It has modelled the role of message content and influencers in SM rebroadcasting. Trying

to reach more people, companies try to write more specific Social Media messages that are more likely

to be rebroadcasted. A follower can share a company’s post with his/her friends or retweet a “tweet” to

his/her followers, which will expand the reach of the original message. To check the underlying

dimension, the study made use of a factor analysis which was applied on a large data matrix; consisted

out of messages with a zero or one, either the word out of the word bank is included in the message or

not (De Pelsmacker & Van Kenhove, 2007). Three Factors were found with an own value greater than

1 (School: school, mba, prof etc.; Finance: equity, sector, fund etc.; Politics: tax, votes, Obama etc.).

The study concluded that rebroadcasting activity depends on the content of the message. That’s why

marketeers should focus on posting messages about topics that are more likely to be rebroadcasted.

Specific to this study, school- and politics-orientated messages are more likely to be rebroadcasted than

finance-orientated messages. Compared to the study from Zhang et al. (2017), the study from Netzer et

al. (2012) had a broader range. The application was demonstrated on building a network on sedan cars

and diabetes drugs forums. Although it is useful to see how UL has been used in content-classification,

it is not useful to compare the different classification frameworks and try to come up with one framework

since UL depends on the characteristics of the data. The three themes (school, finance and politics) are

inherent to the business school data coming from the study from Netzer et al. (2012).

30

9 Media Type Approach The last approach looks at to which extend different types of media have been used in previous

classification frameworks. Our data consist out of posts of Facebook without the possibility to see if a

picture or video is added to the post. This media approach will not be applied to our data but it is worth

mentioning how this approach has been used as well as to see if there is a “best” approach for media

classification. Secondly, our research main focus is on the different (content) approaches and not on

different media approaches. Still, it is worth mentioning how media elements have been used in order

to come with a well-established media approach for further research. Media elements have been

commonly used in previous frameworks. Some literature made use of text, photo, videos, links etc. (Kim

et al., 2015; Sabate et al., 2014) while mostly later research came up with newer terms and looked at the

interactivity and vividness of each type of media element (Cvijikj & Michahelles, 2013; L. de Vries et

al., 2012). Our MTA consists out of vividness and interactivity which will be further explained. A

literature review of the MTA can be found in Table 11.

Table 11: Literature Review: Media Type Approach

9.1 Interactivity A first way of increasing the importance of BP is interactivity. It is “the degree to which two or more

communication parties can act on each other, on the communication medium, and on the messages and

the degree to which such influences are synchronized” (Liu & Shrum, 2002, p.54). We also refer to

interactivity in the MOA. As mentioned before, the main difference is that we look at what media types

drive interactivity while relationship/interaction looks at what content can be found in the post. It is of

course clear that these variables are closely interrelated and support each other. Posts who contain

questions or links to a website have a higher degree of interactivity compared to only content posts. The

higher the possibility to get more involved in the post (links, comment options, surveys…), the higher

the interactivity value. Research showed that posts that mention a question have a negative relationship

with likes but a positive relationship with comments (de Vries et al., 2012). But a later study from

Tafesse (2015) found a negative relationship between BP interactivity as well with likes as shares.

de Vries et al. (2012) 3 levels: (1) Low: pictorial (photo or image), (2) medium: event (application at the brand page and announces and upcoming (offline) event of the brand) and (3) high: video (mainly videos from Youtube)

questions & links

Tafesse (2015) 3 levels: high (video), moderate (2 images) and low (0/1 images) 3 levels: high, moderate and low

Sabate et al. (2014) richness: images, videos and linksKim et al. (2015) text, photo or videoAshley & Tuten (2015) animation (motion)Lee et al. (2018) message type: app, link, photo, status update or videoStephen et al. (2015) rich media: images and videos & URLS's: linksCvijikj & Michahelles (2013)

4 levels: (1) photos (V = low, I = low), (2) status (V = no, I = low), (3) video (V = high, I = high) and (4) link ( v = medium, I = high)

Authors Vividness (V) Interactivity (I)

31

9.2 Vividness Another variable which has commonly been used as a media type is vividness. It is an indicator to which

degree a BP stimulates the 5 different senses. For example, a video has a higher vividness ratio compared

to a photo, since it not only stimulates sight, but also hearing. The study from de Vries et al. (2012)

classified vividness into 3 levels (low: pictorial, medium: event and high: video). A later study from

Cvijikj & Michahelles (2013) took even a further step and combined vividness and interactivity for each

type of media. 4 types of media were taken into account: (1) photos (V = low, I = low), (2) status (V =

no, I = low), (3) video (V = high, I = high) and (4) link ( v = medium, I = high). Analysis showed that

low interactive posts (i.e. photos and status updates) increase the total level of engagement while vivid

content (i.e. videos, photos and links) increase the reach of the message. The study from Sabate et al.

(2014) also focused on BPP in terms of number of likes and comments. But instead of vividness and

interactivity, this research independent variable (IV) is the richness of a BP, which takes images, videos

and links into consideration (Figure 5).

Figure 5: Conceptual Framework (Sabate et al., 2014)

Stephen et al. (2015) also used this rich media category as a media element, but took URL’s (links to

other website) as a different media element. Richness of the content in terms of images and videos

increases the impact in terms of likes while videos have no effect on the likelihood to have more

comments on a post. To increase the number of comments, marketeers can publish posts with images or

not mentioning links, since this metric has a negative influence on the number of comments in a post.

Another interesting remark is that including images seem to have a powerful impact on CE compared

to videos since they have an impact on likes as well as on comments. While the study from de Vries et

al. (2012) concluded that vividness has a positive impact on the number of likes and Cvijikj &

Michahelles (2013) concluded that vividness increases the reach of the message, the study from Stephen

et al. (2015) found that little evidence is available that media elements (Rich media which is commend

to a high vividness factor or URL’s) have no impact on CE. Same conclusion could be made for

mentioning holidays. While marketeers find media elements really important, it seems to be sometimes

ineffective to increase social engagement with the consumers. Later research from Tafesse (2015) found

32

a positive impact between brand vividness and the number of shares of a BP, but not on the number of

likes. Animation was one of the message strategies from the framework from Ashley & Tuten (2015).

Although not much information is given to what extend animation is linked to media elements, it takes

another viewpoint compared to previous research (vividness & interactivity). Since our data has no

information about media elements, we will not take this approach into our model.

10 Valence Sentiment analysis is a type of data mining that measures the sentiment of a piece of text (blogs, reviews,

newspapers, tweets, posts etc.) through natural language processing (‘Sentiment Analysis’, n.d.). It tries

to get useful insight of complex data and how posts are emotionally charged. It is a commonly used

technique in content analysis. Valence of a BP is a variable that has been used as a variable in previous

content classification research, which can be calculated through sentiment analysis (de Vries et al.,

2012). It refers to a positive, neutral or negative minded post. The study from Setty et al. (2014)

conducted a sentiment analysis (valence approach) on life event posts. Posts were classified into happy,

neutral or sad Facebook posts. To check the sentiment (polarity) of a specific word, they made use of

the Senti WordNet dictionary. Another study from Hopkins & King (2010) classified blogs based on the

American election of 2008 into a sentiment category going from extremely negative (-2) to extremely

positive (2). Berger & Milkman (2012) used positivity (the difference between the percentage of positive

words and negative words in a specific article) as a valence factor while Goh et al. (2013) used valence

as the net positivity (the number of positive concepts minus the number of negative concepts).

Emotionality defined as the percentage of words that are classified as positive or negative was another

variable from Berger & Milkman (2012). It also goes beyond mere valence to study how emotions drive

social transmission taking different emotions into account compared to previous research (de Vries et

al., 2012; Hopkins & King, 2010). Characteristics that were analysed where anger, anxiety, sadness,

awe (feeling of facing something greater than yourself), positivity & emotionality.

Sentiment analysis is mostly applied through automatically text coding. Each word is always checked

with a lexicon library (containing thousands of words) which gives a value -1, 0 or 1 which refers to a

sentiment type (negative, indifferent or positive). A second commonly used approach is based on

machine learning, which has a higher accuracy but is also more time consuming. Most of the time these

two approaches have been combined to increase performance. A variable other than valence or emotion

that has been used in previous research is emotional appeal which focuses on setting up positive or

negative emotions by specific content used in a BP (Swani et al., 2014, 2017). The difference compared

to valence is that emotional appeal had to be coded by individual people while valence can be

automatically coded by sentiment analysis. Since our research does not focus on sentiment analysis, we

will not take valence into one of our proposed frameworks. Still, it is worth mentioning how valence

have been in past literature for further research.

33

11 B2B vs B2C Next, we will give a brief overview of the different outcomes of categorization types that have been

found between B2B and B2C on BPP. Our study will focus on the B2C view since our data is coming

from bars & restaurants who have online brand pages. B2B refers to a transaction that is conducted

between companies to another company while in B2C a company is selling directly to an individual

consumer. An example of a B2B is when a company is selling parts of a car engine to a car

manufacturing company. If you are buying a new cell phone in the Fnac, we are talking about B2C. The

focus of B2C is on the customer needs, provided through products or services, while the focus of B2B

is more on improving companies’ operations through services or products to other businesses (Chen,

2019).

Figure 6: Conceptual Framework (Swani et al., 2017)

A study from Swani et al. (2017) has evaluated the popularity of SM posts in comparison of B2B to

B2C. The framework is based on SM message content strategies, which consist out of brand cue

(corporate name and product name), message appeal (functional and emotional), selling strategy and

information search, which came from his previous research in 2014. The DV is the popularity of SM

messages (likes and comments) (Figure 6). Market type is the moderator, which can positively or

negatively strengthen the relationship between SM message content strategies and the popularity of SM

messages. Control variables taken into account are the size of the Facebook fan base and the message

time (time between the message post and the storage of the post). While the B2B environment is

characterized by highly involved and rational situations (high level of cognition), the B2C view is

characterized by less involvement and more emotional triggers (low level of cognition). This is also

confirmed through his previous research in 2014 where B2B tweets have a more functional message

34

appeal while in a B2C environment, tweets have a more emotional appeal. Secondly, B2B marketeers

focus more on corporate brand strategies, but product brand strategies seem to have equal appearance in

B2B and B2C. Thirdly, direct calls to purchase (“hard sells”) are more commonly used in a B2C

environment. Finally, embedded links and cues, as well as hashtags, are more likely to be used in B2B

tweets than B2C tweets. In addition, characteristics of B2B and B2C can change over time. The results

from 2016 indicate that B2B message posts have a higher number of message likes compared to B2C

messages, but have a lower number of comments compared to B2C messages. This explains why in a

B2C environment, people are more likely to comment on message posts. Moreover, the involvement of

functional and emotional appeals, corporate brand names and information search enhance the popularity

of B2B brand posts compared to B2C brand posts (Swani et al., 2017).

12 Human classification coding Our model (which will be explained in the next section) will consist out of 5 different approaches. 4 out

of the 5 approaches (BCA, CA, MSA & MOA) make use of a SL approach which means that human

judgement is involved to classify the different variables. Before going deeper into the methodology, we

first want to give our thoughts and remarks on the human coding of first 1000 posts of our dataset. Some

categories were more straightforward to classify compared to other variables. The 4 supervised

approaches which we will use in this research have been deeply described in previous sections. We refer

to Table 2 and the literature review for a better understanding of the different variables which are used

in each classification framework. As mentioned before, we did not took the VMRA into our models

since this model has too much overlap with the other approaches. The labelled dataset will be used to

test the performance of the different classification approaches as well as to build a prediction model,

which will be used to classify all messages of our data. Table 12 (p. 35) gives an overview of how many

posts were classified of each type of variable for each approach, together with an example of a message

from our data. The coder based his classification on the literature review and the explanation of the

different variables in previous sections. Facebook messages with no messages were coded as zero for

every category. The reason some messages had no content is because they consisted out of a media type

element (e.g. sharing of a video or an image). As mentioned before, our dataset did not include the media

elements of the Facebook post.

12.1 (Base) Content Approach The Base Content Approach can be derived from the Content Approach. First of all, if the post was

coded as information or as transaction, than the post was assigned to the transaction/entertainment

category of the BCA. Secondly, since the information category is the same in BCA as in the CA, there

was no need for human coding of the information category again. 631 posts were classified as

entertainment. Since our data is coming from bars and restaurants, most of the time they post about new

35

upcoming events. Even though some previous research mentioned that entertainment is sometimes

unrelated to the brand, this was not much applicable to our data. Information posts want to share valuable

content to the customers, for example: status of the weather if it would be possible to play golf or not, a

bar who is looking for new bar tenders etc. If we would apply this category to a company who

manufactures its own products, this category would more talk about product specifications, product

reviews or product recommendations. The third and last category is transaction. Applied to our data

transaction posts, this is about the deals of drinks or food, contests to win prices, sweepstakes etc.

Table 12: Overview Human Coding Classification

12.2 Message Strategy Approach The MSA consist out of functional, experiential, emotional and brand resonance. Although this a well-

established approach on its own, we still have some remarks why we think this is not the best applicable

approach to our data. First of all, our data are coming from restaurants, bars and local sport companies.

Most of the time, they post about their specific promotions, events or sweepstakes. These kind of

companies are offering most of the time services to the customers and do not have a “product” to deliver.

This makes it difficult to classify posts into the functional categorization which mainly focuses on the

Variable Frequency Relative Frequency Message example

MD0_INF 243 24% Same as MD1_INF. MD0_TRA.ENT 575 58% MD1_ENT or MD1_TRA.

MD1_INF 243 24% ....ATTENTION GOLFERS....GOLF FOR TONIGHT HAS BEEN CANCELLED. YOU WILL NOT NOT HAVE TO MAKE IT UP, BUT IF YOUR BORED STOP DOWN AND HAVE A COLD ONE!

MD1_ENT 631 63% The neighborly bar will be hosting a Celebaration of life, for our dear friend Kevin Coffey, Thursday December 15, from 3-7, please join us and his sisters Kim and Lisa

MD1_TRA 352 35% ******CONTEST TIME****** WERE GIVING AWAY FREE 7 TICKETS ($100 VALUE) TO OUR ICE RAFFLE THE DAY OF THIS CONTEST. TO BE ENTERED INTO THE DRAWING, ALL YOU HAVE TO DO IS LIKE THE NEIGHBORLY BAR PAGE, COMMENT ON THIS POST (On the Neighborly Bar's Facebook page) YOUR FAVORITE BAIT TO USE, AND SHARE THIS POST! Last day to enter is Sunday Jan. 31ST! WINNER WILL BE ANNOUNCED MONDAY FEB. 1ST!

MD2_FUN 22 2% Dancing Saturday night to the greatest band in the northland Gypsy Road 9-1, Neighborly bar

MD2_EXP 778 78% GYPSY ROAD TONIGHT!!! 9-1. Cmon down and dance!!! MD2_EMO 68 7% a SUPER THANK YOU, to the lovely ladies who cooked for the packer game for me, I

THANK-YOU SO MUCH, for all the great food and cake, also CONGRATS to KEVIN BRAVICK, he won the trash talk raffle for $660.00!!!! what a great game and wonderful party, THANK YOU EVERYONE!!!! MD2_BRA 57 6% With a heavy heart we say goodbye to our GM ,Jed Miller today, so many great memories, we will all miss you

MD3_TAS 665 67% IF YOU SEE OUR BEAUTIFUL BARTENDERS MARIAH, AND ASHLEY, TODAY WISH THEM BOTH A HAPPY BIRTHDAY!!!

MD3_REL 98 10% WELL, ITS SUMMERTIME, TIME TO HAVE A PARTY! a BIG PARTY, BIGGER THAN THE LAST 29!!!! ITS TIME TO KICK OFF OUR 30TH YEAR WITH OUR ANNUAL SUMMER PARTY. THIS ONE IS GOING TO BE BIGGER AND BETTER THAN THE REST, AND THE PRIZES ARE AMAZING! STOP IN FOR DETAILS! PLEASE SHARE!

MD3_SEL 79 8% THANK YOU TO EVERYONE WHO ATTENDED OUR 30TH SUMMER PARTY, ALSO OUR FAMILY, FRIENDS, FOR ALL YOUR HELP, AND OUR WONDERFUL HARDWORKING BARTENDERS, THE BANDS, BOOTH WORKERS, STOCK BOYS, THE LIONS CLUB, AND MOST OF ALL----MOTHER NATURE, THANKS FOR HOLDING OUT TILL IT WAS OVER!!!

Base Content Approach

Content Approach

Message Strategy Approach

Marketeer's Orientation Approach

36

functional claims of services or products. Based on our data, these kind of companies even do not post

about their “best service claims.” We only categorized posts into the functional category if the post is

really about making a claim with regards to a great service or drinks/foods they deliver. We did not

categorize experiential posts as functional, even if they mentioned some adjective in their post which is

used to make the post more attractive. This can be seen as a functional claim but it is not the core message

of the post. Only when the main focus of the post is about a product or service functional claim, we

categorized it into functional.

The framework was mainly dominated by experiential posts which consisted of 778 of the 1000 posts

in the labelled test data. Experiential posts are mainly about the promotion of events which can cause

physical actions, they want to stimulate their customers to come to their (brand) events or just even come

by tonight for a special promotion. It is a kind of mix between entertainment (event) and transaction

(promotion) of the previous CA. Posts who congratulate someone were only classified as experiential if

they also stimulated a reaction towards the consumers, for example by saying “come and celebrate

tonight our bartenders’ birthday!” Posts were classified into the emotion category if they really wanted

to evoke emotions. For example: talking about a beautiful day, saying goodbye to someone who worked

for a long time at the bar or thanking everyone for coming to a charity event. The last category of the

MSA classifies posts into brand image. The main focus of these posts is on the direct relation to the

image of the brand. Applied to our data, it is about saying happy birthday to someone of the company,

talking about the history of the company, thanking people from the company etc.

12.3 Marketeer’s Orientation Approach The main focus of relationship-oriented posts is to end with an “interaction.” If the post is only about

the promotion of an event, then it will only be classified as task-oriented. So a post that says “come

tonight to our new event” is not seen as a real call to action but as a promotion type. Signing in for an

online contest, calls to actions or applying at the bar for a job are some examples of relationship-oriented

messages. Posts who thank persons or wish a happy birthday to people from the personnel were

classified as self-oriented posts.

12.4 Takeaways 1. Classification depend mainly on human interpretation and judgement, other people can

interpret messages in a different way and classify them into another categorization variable.

2. Classification depends on the industry of the companies.

3. The Message Strategy Approach is dominated by the experiential category, while the

marketeer’s strategy approach is dominated by task-oriented posts. This is mainly due to the

characteristics of our data coming from bars and restaurants. For example, in the

manufacturing industry, the functional category of the message strategy would be much higher,

since these kind of companies promote more their products instead of promoting events.

37

13 Methodology 13.1 Data The data, coming from Facebook, were collected over 6 years between the first of January 2011 and

30th of December 2016. In total 240.210 different posts were assembled coming from 476 different

Facebook pages. We believe that a time span of 6 years is enough and valuable to our research. For each

post the feed_id was collected (unique value) together with the time and date of the post creation

(feed_created_time), the content of the message (feed_message), the number of reactions, shares and

comments of a post (reactions_count, shares_count & comments_count), the page_name (ID of the

companies’ Facebook page) and the time and date when the post was extracted (extracted_on). Through

a random check of 10 different page ID’s, we could check out of which kind of companies our data

consisted of (Exhibit 2). All of the 10 different Facebook pages had bar as a page tag. Also a deeper

look at some random Facebook posts made it clear that our data are coming from the bar/restaurant

industry.

13.2 Model description The conceptual framework from our thesis can be found in Figure 7. Our methodology focuses on

answering the two research questions. First of all, we want to evaluate which approach is best suitable

for content classification (performance analysis). Secondly, we look at the relationship of each variable

of each content classification approach with BPP (number of reactions, shares & comments) (impact

analysis).

In total, our research takes five different approaches into account for the classification of Facebook

posts. The first model is based on the concept of topic modelling which is an unsupervised method which

automatically classifies documents into themes (‘A gentle introduction to topic modeling using R’,

2015). We have chosen for topic modelling since this is the best suitable unsupervised model technique

for message data. The other four approaches (BCA, CA, MSA & MOA) have been deeply described in

previous sections. We refer to the section of the literature review for a better understanding of the

different variables which are used in each classification framework. As mentioned before, we did not

took the VMRA into our models since this model has too much overlap with the other approaches.

First of all, we explain the unsupervised model which made use of topic modelling together with the

evaluation approach. Since this model does not make use of a labelled outcome, we can label the entire

dataset, as well as evaluate the performance of the model by applying it straight to all messages. The

approach for our supervised model is a little bit different. Here, we make a differentiation between the

performance of the model and the labelling of the entire dataset which is needed for our impact analysis.

First of all, we will apply Random Forest (RF) as a classification algorithm on the 1000 human classified

posts (labelled dataset). By doing so we can split our labelled dataset into a training and a test set which

38

is needed to evaluate the performance (performance analysis) of our supervised classification

frameworks. We will try to come up with a “best” classification approach which is suitable in SM.

Secondly, we conduct a basic impact analysis of all the posts on BPP (number of reactions, count &

comments). Since labelling 240.210 posts for every supervised classification approach would be too

much time consuming, we will build a predictive RF classification algorithm on the labelled dataset

(1000 posts) and apply this algorithm to the whole dataset (so no test set is included).

Figure 7: Conceptual Framework

Our methodology is structured as follow. First, a brief understanding is given of the basic matrix which

is needed for applying topic modelling as well as RF. Secondly, we will explain the methodology and

performance metrics of our topic model (unsupervised algorithm). Thirdly, the (supervised)

classification algorithm RF will be explained, followed by the method for the performance evaluation

of RF. The last part consists out of the methodology to test the variables of each content classification

approach on BPP (number of reactions, counts & comments).

13.3 Data preparation The basis for the topic modelling method as well as for the supervised classification algorithm consists

out of the creation of a document term matrix (DTM). A document matrix consists out of documents in

the rows which are Facebook posts, applied to our data and words who appear in each document in the

columns (Feinerer, 2018). To come to this matrix some pre-processing is needed. First of all, non-

recognizable characters and emotions were deleted from each Facebook message. Next, a corpus was

created which is a collection of text documents which allow us to clean the documents (ex. remove

numbers, remove punctuations, remove stop words, stem the messages etc.). Terms needed to be in at

least in 0,5% of the Facebook posts to stay in the matrix to get rid of unnecessary words.

Unsupervised Classification Supervised Classification

Topic Approach Base ContentApproach

Content Approach Message Strategy Approach

Marketeer's Strategy Approach

Information

Transaction

Entertainment

Information

Entertainment/ Transaction

Functional

Experiential

Task-oriented

Relationship/ Intereaction-

orientedEmotional

Self-orientedBrand Resonance

Number of comments

Food

Performance

Bra

ndP

ost P

opul

arit

y

Party

Number of reactions

Number of shares

Control V

ariables

Number of words

Weekend day

Performance analysis(evalaution of classification approaches)

Impact analysis

39

13.4 Unsupervised algorithm Our first model is based on UL which requires no human involvement to label the brands. Through this

technique we want to capture the underlying structure or dimensions in the data without knowing

beforehand the corresponding labelled output. Even though previous research (Netzer et al., 2012) made

use of factor analysis, we will make use of topic modelling for our unsupervised classification approach.

It is a method which is specifically developed to apply for content while factor analysis is more used in

survey data. Topic modelling looks for “topics” in the collection of different posts and discovers hidden

structure in the data. One of the main advantages of using topic is that it allows words to overlap over

the different topics, compared to hard clustering methods. We will make use of Latent Dirichlet

allocation (LDA) which is a “probabilistic model of a collection of composites made up of part” (‘Your

Easy Guide to Latent Dirichlet Allocation’, 2018). Applied to our data, each Facebook message is stored

as a separate document (composite) which consists out of different topics (parts) and each topic consists

out of a bag of words (Silge & Robinson, 2017). The main concept to understand topic modelling is that

the goal is to derivate the hidden structure in the data, given the words and documents. This is done by

an iterative process which mainly focuses on recreating each document by adjusting the relative

importance of words in topics and topics in documents till a best “topic structure is found.”

To find the optimal numbers of topics, we make use the ldatuning package from Murzintcev (2019).

The method trains multiple LDA models with k topics at once. To evaluate the model we will make use

of the simple approach which states that the best number of topics is where the metrics Arun2010 and

CaoJuan2009 are minimized and the metrics Deveaud2014 and Griffiths2004 are maximized. Since our

supervised classification approaches range between 2 and 4 categorization types, we do not want to have

too many topics and let the parameter k ranges between 2 and 5.

To further evaluate the model, we will first look at the probability of each word being generated from a

topic. This can be evaluated by looking at the “per-topic-per-word probability, called β (“Beta”). To get

a better understanding of the content of each topic, we will look at the top 10 terms which are most

common of each topic, look if the different topics make sense and try to come up with an overarching

theme. Secondly, LDA also estimates each document as a mixture of topics which can be evaluated by

looking at the per-document-per-topic probabilities, called γ (“Gamma”).

13.5 Supervised algorithm Our next section focuses on the classification algorithm used for the SL models. We make use of RF

together with a Singular Value Decomposition (SVD) step to reduce the number of columns. As

mentioned before, we will apply two times RF, one time to check the performance of our model, and a

second time to label our entire dataset.

40

The following 4 steps are repeated for each variable of a classification approach before RF is applied

(for testing the performance of the test data, as well as for labelling the entire dataset).

1. Make a Corpus for training and test entire data.

2. Create training and test/ entire data with n-grams.

3. Make DTM from test and training/ entire data so they have the same terms.

4. Convert DTM to common sparse matrices and apply Singular Value Decomposition.

13.5.1 Singular Value Decomposition SVD, also known as Latent semantic indexing. It is a widely used technique to reduce the number of

columns (Klema & Laub, 1980). It states that every m x n matrix can be reformulated by

With U: the left singular vectors (orthogonal matrix, m x r), !: the singular values (non-negative

rectangular diagonal matrix, r x r) and V: the right singular vectors (orthogonal matrix, r x n). Also

∑=diag(σ1,…,σr), r=min(m,n), with σ1≥…≥σr≥0 with σi the singular values. We will make use of the

irlba package from Lewis (2019) and set the number of vectors equal to 50. This will result in a matrix

with the concept loadings of each message on a specific vector which will be used as our new matrix

where we will apply RF on. The reason why we mainly apply this technique is to reduce the complexity

of our DTM matrix in order to reduce the number of columns which represent the words who appear in

the messages.

13.5.2 Random Forest RF is an ensemble of decision trees which are trained with a bagging method and random variable

selection (Breiman, 2001). The concept of bagging is that successive trees do not depend on earlier trees.

Each tree in the forest is constructed by using a bootstrap sample (i.e. sampling with replacement) and

the prediction of a variable is based on the majority vote (Liaw & Wiener, 2002). Bagging improves the

stability and reduces variance and accuracy while boosting mainly reduces bias and variance. So in other

words: RF builds multiple decision trees and merges them together to increase performance and to get

a more stable prediction (Donges, 2018). Binary Recursive Partitioning algorithm is used to construct

the trees from the Forest. At each split, it searches for the best possible split to create a binary partitioning

of the data by taking a randomly chosen subset of the predictor variables (Donges, 2018). In practice,

the best split " is the one that maximizes the decrease of impurity of the parent node τ, as measured by

the Gini index #(1 − #) (Berk, 2007) as follows:

41

with τ the cases in the parent node, $% the left child node cases, $& the right child node cases, p short

for #(' = 1) with ' ∈ {0,1} and | . | representing cardinality (number of elements of the set). Finally,

predictions can then be made by aggregating the predictions of the trees based on the concept of majority

votes. The model iterations can be summarized by the following steps (Liaw & Wiener, 2002):

1. Draw n tree bootstrap samples from the original dataset.

2. For each bootstrap sample, grow un-pruned tree by choosing the best split based on a random

subset of mtry predictors at each node with mtry = the number of predictors.

3. Predict new data using majority votes.

RF has some advantages above other classification algorithms. First of all, it is a user-friendly algorithm,

only two parameters should be set: the number of trees and the number of variables tried at each split.

We will follow the standard guidelines from Breiman (2001) and set the number of trees high (1001)

and the root square of the number of variables as candidates for each split (mtry = default (p)). Since we

will apply SVD to reduce the number of columns (words) in our document matrix, p will equal the

number of vectors. Secondly, RF is very robust to overfitting (occurs when a too complex model is made

to generalize the information from the training data onto the test data) and it can deal with a large number

of features since there are a lot of trees in the forest (Breiman, 2001; Donges, 2018). This is important

to our model since we have a lot of predictors (words) to be evaluated. We will use the random Forest

package by Liaw & Wiener (2012) to apply this algorithm in R studio.

13.5.3 Performance To evaluate the performance of our different classification approaches we will make use of cut-off

independent measures. This measure represent the probability that a given event will take place.

Specified to our data this means the probability that a given message will be a classification variable

(ex. Emotional oriented or information). The evaluation is highly sensitive to the chosen cut-off value.

Therefore, the performance of the model is evaluated by the Area Under the Receiver Operating

Characteristic Curve (AUC). The calculation of AUC is based on comparison between the predicted

status of an event and the actual status of an event for all possible cut-off measures between 0 and 1

(Larivière & Van Den Poel, 2005). The AUC is defined as follow:

With TP: True Positives, FN: False Negatives, FP: False Positives, TN: True Negatives, P: Positives

and N: Negatives. The AUC can graphically be represented by plotting the true positive rate against the

false negative rate which is equal to plotting the specificity (TN/N) against one minus sensitivity (TP/P)

across all possible ranges of thresholds (between zero and one). The area under this curve represents the

metric measure and is used to evaluate the accuracy of the model (Hanley & McNeil, 1982). The values

42

of the AUC can range between 0.5 and 1. A value equal to 1 means that the model did a perfect prediction

while a value equal to 0.5 states that the model does not better than a random selection. The AUC will

be our main evaluation metric to test on each type of variable from each classification approach.

We can also look at the variable importance for the predictions to see which variables have the most

predictive power. This can be calculated by the Mean Decrease in Accuracy (MDA) and the Mean

Decrease in Gini (MDG). The MDA is calculated by taking the difference between the accuracy when

the values are scrambled for each tree (and averaged to get a single value) and the Out-of-Bag (OOB)

accuracy of the model. The OOB (the data not in the bootstrap sample at each bootstrap iteration)

accuracy is measured by the proportion instances correctly classified, compared to the total amount of

instances (Liaw & Wiener, 2002). The MDG is equal to the average of a variable’s total decrease in

node impurity over all trees in the Forest which is measured by the Gini index. Impurity is calculated

for each child node and compared with the original node (‘Random Forests’, n.d.). Variables who split

labelled nodes into pure single nodes will have a higher MDG, which makes them a more important

variable. These measures will be less important for our study since we concentrate on the overall

performance of the different models. Secondly, we used SVD to reduce the number of columns, so these

metrics will show the importance of the vectors and not of the words.

13.6 Basic impact analysis The last part of our research focuses on the relationship between the different classification approaches

and BPP (number of reactions, comments and shares). Compared to previous sections which focused

only on the 1000 labelled BP’s to evaluate the performance of an approach, the impact analysis will

make use of the entire dataset (240 210 message posts). Our DV’s are count data with a Poisson

distribution, with a Lambda lower or equal to zero since the number of observations over the number of

likes, comments or shares exponential decrease. Exhibit 3 plots the count of the DV’s, taking also the

difference between the information variable into account. A lot of BP’s produce no reactions, comments

or shares while only a few posts have high count DV’s. The lambda parameter sets the curve of the DV

and is equal to the average mean and variance of the count variable (Cameron & Trivedi, 2005).

Secondly, our DV’s deal with over-dispersed count data since the (conditional) variance is higher

compared to the (conditional) mean (e.g. the number of counts M = 0.8, SD = 7.48). Since our DV’s

have a highly left skewed distribution and the residual errors do not follow a normal distribution,

applying a normal linear regression is not suitable to our data. To overcome this problem we will make

use of two adjusted regression models.

43

Ordinary Least Squares Regression (OLSR) - Our first model will make use of OLSR where the log

is taken from the DV + 1 (taking the log of zero is undefined). Secondly the log will also be taken from

the length of words (control variable) since this variable also has an exponential decreasing distribution.

OLSR makes use of minimizing the sum of squared residuals (the difference between the observed

response value and the predicted value of the model) (Gareth, Witten, Hastie, & Tibshirani, 2017). We

will use a stepwise mixed method for the selection of the variables of the model. Still, this model lacks

the capacity to model the dispersion. A F-statistic will be applied to check whether the applied OLSR

model is significant or not (it checks if at least one of the regression coefficients is not zero). R square

tells us more about the accuracy of the model, measuring the proportion of variance in the DV explained

by the IV’s. More specific, we will look at the adjusted R square which adjust R square to the number

of parameters taken into account (Gareth et al., 2017)

Negative binomial regression (NBR) – A NBR can be used when data is highly over-dispersed, which

is still an issue when using OLSR. Compared to a Poisson regression (which assumes that the variance

is equal to the mean), it has an extra parameter to model for over-dispersion which can account to the

higher variance compared to the mean of the count data (‘NEGATIVE BINOMIAL REGRESSION | R

DATA ANALYSIS EXAMPLES’, n.d.). The model is based on finding β coefficients for the different

variables by maximizing the likelihood function. A Chi square test will be applied to test the overall

significance of the model, checking if there is a difference between the residual deviance and the null

deviance of the specific NBR model. The test checks if the lack of fit of the model reduces by taking

more variables into the model than only the intercept model. Secondly, we will also look at the Akaike

Information Criterion (AIC) of the model which is another parameter of measuring the fit of the model

(‘NEGATIVE BINOMIAL REGRESSION | R DATA ANALYSIS EXAMPLES’, n.d.). AIC estimates

the relative quality loss for a given statistical model. When different models are used for the same data

with the same DV, the model with the lowest AIC is considered as the “best” model (Arnold, 2010).

Table 13: Different models of impact approaches

To check the relationship between each categorization variable of the different approaches and BPP, we

will use the following procedure. First of all, we will build an overall model for each DV (reactions,

comments or likes), taking all the different classification variables into account (Multi Approaches

OLSR NBR OLSR NBR OLSR NBRx x x x x x

Topic Model * x x x x x xBase Content Approach * x x x x x xContent Approach * x x x x x xMessage Strategy Approach * x x x x x xMarketeer's Orientation Approach * x x x x x x

*Each approach also takes the 2 control variables into the model (number of words & weekendday)

Reactions Shares Comments

Multi Approaches model *Isolated models

44

model) (MAM). Still, since information is a category in the BCA as well as in the CA, we will remove

the information variable from the CA in the MAM to overcome overlap. By looking at the MAM, we

assume that there is not much overlap over the other classification types. Secondly, to overcome this

assumption, we will also test each approach independently (Isolated model) (IM) on the number of

reactions, likes and comments. Table 13 (p.43) gives an overview of the different models which will be

tested.

13.6.1 Independent variables Hopkins & King (2010) stated that only a few hundred of documents needs to be hand coded on training

data and to use this set to further apply the labelling on the larger population. It showed that at around

500, the extra performance starts to be insufficient. Thus coding more than about 500 documents in your

training data seems to be inadequate when you are under time pressure. However, to increase the

performance of our model, we manually coded 1000 Facebook posts per category. We will build a

prediction model for each variable of the approaches who are based on supervised learning. Next, we

can apply the model to our entire dataset. Compared to previous section which focused on the evaluation

of the different approaches and made use of cut-off independent measure, we want to have a

deterministic binary prediction whether a message will belong to a category or not {0,1}. Our RF model

will make use of single cut-off measure to evaluate deterministic binary classification. The outcome of

the RF will be 12 times a categorical factor variable consisting out of 2 levels (e.g. MD0_INF will be

equal to one if the message contains information, zero otherwise).

Table 14: Independent variables

Variable Description

TM_1 Dummy variable = 1 if content of the message contains the party topic, 0 otherwise. TM_2 Dummy variable = 1 if content of the message contains the food topic, 0 otherwise. TM_3 Dummy variable = 1 if content of the message contains the performance topic, 0 otherwise.

MD0_INF Dummy variable = 1 if content of the message contains information, 0 otherwise. MD0_TRA/ENT Dummy variable = 1 if content of the message contains transaction/entertainment, 0 otherwise.

MD1_INF Dummy variable = 1 if content of the message contains information, 0 otherwise. MD1_ENT Dummy variable = 1 if content of the message contains entertainment, 0 otherwise. MD1_TRA Dummy variable = 1 if content of the message contains transaction, 0 otherwise.

MD2_FUN Dummy variable = 1 if content of the message contains functional, 0 otherwise. MD2_EXP Dummy variable = 1 if content of the message contains experiential, 0 otherwise. MD2_EMO Dummy variable = 1 if content of the message contains emotional, 0 otherwise. MD2_BRA Dummy variable = 1 if content of the message contains brand resoance, 0 otherwise.

MD3_TAS Dummy variable = 1 if content of the message is task-oriented, 0 otherwise. MD3_REL Dummy variable = 1 if content of the message is relationship/interaction-oriented, 0 otherwise. MD3_SEL Dummy variable = 1 if content of the message is self-oriened, 0 otherwise.

feed_word_length Numer of words of the message. weekendday Dummy variable = 1 if message is created during a weekend day (Saturday or Sunday), 0 otherwise.

Control variables

Topic Approach




Content Approach

45

Timing of a post has been commonly taken into account as a control variable. Cvijikj & Michahelles

(2013) found a positive relationship between a post created during the week and the number of

comments, while de Vries et al. (2012) found no signification relationship. We will take the day of the

week (is the post created in the week or in the weekend) into account as a control variable. Also

advertising literature suggested that message length or the number of words effect BFP (de Vries et al.,

2012), therefore we will also take the number of words for a BP into our model as a control variable.

13.6.2 Dependent variables BPP or CE has been commonly used in SM studies as a DV. BPP consists most of the time out of the

number of likes and the number of comments for a specific post (de Vries et al., 2012; Sabate et al.,

2014; Shen & Bissell, 2013; Swani et al., 2017). How many times a post is shared on SM is another

variable which has been used as a DV (Cvijikj & Michahelles, 2013; Kim et al., 2015; Tafesse, 2015).

Compared to the previous research from de Vries et al. (2012), where the number of likes and comments

were in absolute value, the studies from Cvijikj & Michahelles (2013) and Kim et al. (2015) made use

of the like ratio, comment ratio and shares ratio as the DV which adjust the variable to the number of

brand fans. Lee et al. (2018) also added the number of click-throughs as a DV of the online engagement

score. To operationalize BPP which stimulates online CE, we will make use of the following three

dependent variables: the number of reactions, shares & comments.

Companies who make efficient use of their Facebook brand pages through a well-established online

marketing strategy, can increase their customer relationship. Moreover, followers who react on a post

or share a post feel more contributed to the company. Nowadays, people who contribute to the online

engagement of a specific brand page can earn even the “super fan” label (Porterfield, 2011). The sharing

of Facebook posts even large the reach of the message which can result in new potential customers. Our

first DV is the number of reactions while previous research mentioned the number of likes.

Figure 8: Reactions possibilities on a Facebook post (Krug, 2016)

In the month of February 2016 Facebook launched its reaction buttons to the world which is an extension

of the liking button, giving you the opportunity to react fast and easy in more (Krug, 2016). Besides the

like button, you also have the possibility to click on the love, haha, wow, sad or angry reaction (Figure

8). Since our data was collected between the beginning of 2011 and the end of 2016, some posts will

also contain other reactions than only likes. However most of the reactions will still be likes since the

other 5 reactions were not available the first 5 years of our data collection. A random look at the reactions

of post from our data, after the 6 reactions possibilities were launched, made it clear that the like button

46

was only used on our data. Secondly, since our data is mostly coming from restaurants and bars who

share about their specific events and promotions, we consider that the sad and angry button have not

been used a lot. We will assume that the number of reactions is still equal to the number of likes in our

study. The number of shares of a brand message means how many persons have reposted the brand posts

on their own personal page. The number of comments refer to how many comments in total the message

has.

14 Results 14.1 Descriptive statistics An overall review of the descriptive statistics can be found in Exhibit 4. To get a first impression of our

data, we looked at which words appear the most in the messages of the brand pages to get a better

understanding of the content brand pages post about. More specific, which words are mostly used in the

bar/restaurant industry. The five mostly used words are tonight (N = 42 823), come (N = 42 609), night

(N = 40 106), will (N = 35 914) and day (N = 30 086). To get an even better understanding of our data,

we first made a condition to take into account only words who appear in at least 3% of all posts.

Secondly, we looked at some words who have less value to understand the content of the messages (e.g.

come, get, will, see etc.). Exhibit 4 contains 3 word clouds (1st one: no conditions, 2nd one: 3% condition,

third one: 3% condition and less meaning words condition). Based on the word clouds, we can see that

most of the posts contain information about the timing of events (tonight, night, hour, Friday, weekend,

day etc.), bars (drink, beer, music, band, music etc.) and promotions (free, special etc.).

Companies are using different messages to stimulate Facebook. With regards to the topics, posts about

party (topic 1, 40 % of total) were most frequently used, followed by performance (topic 3, 36% of total)

and food (topic 2, 7% of total). The variable names of the topics will be further explained in the next

section. In terms of our base model, posts providing transaction and entertainment content were mostly

common (148.635 occurrences, 62% of total). Concerning the CA, entertainment posts appeared the

most (54%), followed by information content posts (20%). Only a few posts were classified as

transaction (7 375 occurrences, 3%). Messages containing experiential content dominate the Message

Strategy Approach (66%) while the other 3 occurred less than 1% of the total posts (functional: 0.01%,

emotional: 0.07% and brand resonance: 0.10%). As mentioned before, this comes from the fact that our

data is coming from restaurants and bars who post mostly experiential (events) content. In terms of the

MOA, task-oriented posts were most frequently used (150 286 occurrences, 63%) followed by

relationship/task-oriented posts (0.49%) and self-oriented posts (0.40%).

47

To get a first insight on how BPP is perceived on Facebook, we looked at the descriptive statistics of

the three dependent variables (number of reactions, number of comments and the number of shares). In

general, fans engage online with the brand in the form of automated reactions (M = 7.598, SD = 32.42)

more frequently compared to comments (M = 0.803, SD = 7.41) and reactions (M = 1.238, SD = 30.82).

Most BP’s were placed on a Friday while Sunday is the day the least posts are published. This is due to

the fact that bars and restaurants promote their event on Friday evening to start the weekend, while

Sunday is mostly a hangover day. A little bit more than 25% of the brand messages were posted during

the weekends. A Facebook post contains on average around 150 characters (M = 145.8, SD = 263.04)

and consists of around 25 words (M = 24.91, SD = 42.75).

14.2 Topic model The document matrix which was described in previous section was the basis for our topic modelling

(UL approach). Words who did not appear in at least 0.5% of the Facebook posts were removed to get

rid of un-useful words. This reduced the columns which represent words from 80 574 to only 482

important terms.

Figure 9: Optimal number of topics

Based on the ldatuning package from Murzintcev (2019), we came up with 3 topics (Figure 9). The

Griffiths2004 metric and Arun2010 are not informative to look at, since they go from zero to one or

from one to zero and have not got fluctuations of the number of topics. If we look at the other 2 metrics,

we can see that the CaoJuan2009 metric is minimized at 3 topics and the Deveaud2014 metric is

maximized at 3 topics. In practice, the number of topics is mostly bigger (Silge & Robinson, 2017).

Since our research want to compare the different models, we wanted to limit the number of possible

topics. A model which would consist out of +- 100 models will be too big to analyse and to see the

overarching concept of topic modelling. To get a first understanding of the different topics, we wanted

to give a general name to each topic. By asking people which word comes in mind when showing the

48

10 most common words of each topic (Figure 10), we came to the following topic names. (1) PARTY:

words that came in mind when showing the common terms of topic 1 to people were friends, leisure,

activity, planning, events, opening of bars and restaurants. To our concern, the best overall topic name

is party, since the topic talks mainly about opening hours, promotions and drinks. (2) FOOD: some

words which are commonly used in topic 2 are cheese, chicken, salad and onion which are food

ingredients. Secondly it also talks about some food promotions (today & special). (3)

PERFORMANCE: the third topic talks about performance (concerts & gigs). The top 10 terms of a

topic, are the words with the highest word-topic probabilities, measured by the β parameter (Figure 10).

For each possible combination of a word and a topic, the β coefficient is equal to the probability of that

term being generated from that topic (Silge & Robinson, 2017). For example, the first word from our

term document matrix “acoust, derived from acoustic.” The word reminds us of sound and hearing.

That’s why we would match this word mostly with performance, a little bit with party and least with

food. If we further look at the beta value for acoust with all 3 topics, we can confirm this statement. The

term has a 1.653566e-03 probability of being generated from the performance topic, while it has only a

4.219594e-32 probability of being generated from the food topic.

Figure 10: Top 10 terms of each topic

Besides looking at the β parameter, we can also look at the document-topic probabilities by analysing

the γ parameter. It gives us the estimated proportion of words from a specific message that are generated

from topic 1, 2 or 3. A message post from our data has the following content: “Make plans to join us

Saturday from 3-9 for the Pooper Party, great band from superior wi, playing oldies, country, and good

ole rock and roll, lunch, snacks, hats, horns, champagne, plus win cash and prizes, bring on 2017 a bit

early this year plus another party at midnight for all.” About 41% of the words were generated from

49

topic 1 (party), 7% of topic 2 (food) and 52% of topic 3 (performance). Based on the content of the

message, we can confirm that this message is more about party and performance than food. We can

conclude that our topic model is quite a good model based on our results. First of all, it made sense to

come up with 3 different themes when we looked at the 10 most common words of each topic. Secondly,

a closer look at the β and γ coefficients of our model confirmed that our 3 topics are representative.

14.3 Random Forest As mentioned before, four classification approaches were built. Since the focus of this section is on the

performance of our different classification approaches, we will mainly concentrate on the AUC’s of

each variable of each classification framework. AUC measures the performance of a binary classifier,

consequently we applied RF for every variable of a proposed classification (model) approach. In total

12 times a RF model was built to test the prediction efficiency of each variable. Still, a deeper analysis

is given for the first variable of the BCA by looking at the OOB-error, confusion matrix and the variable

importances. A tuning approach is also applied on the first approach to see if improvement of the

information model is possible. The labelled dataset was used to test the performance of the model and

was split in a random training set of 800 posts and the other 200 posts were used a test set. Since our

model consists out of 12 times applying RF for each variable, we will give only once a deeper analysis

of RF applied on the information model. Our research focuses on the comparison of the different

classification approaches which makes AUC the number one evaluation metric for binary classification.

14.3.1 Information model Our first model wants to classify our messages into information and non-information. The two

parameters of RF were set at 1001 (number of trees) and 7 (variables tried at each split, this number is

equal to the root square of 50, which is the total number of vectors of our base matrix). The OOB error

is equal to the mean of the prediction error using data not in the bootstrap sample, for each bootstrap

iteration and related tree (Liaw & Wiener, 2002). The OOB error for our information model is equal to

20.15%. The result of this OOB error rate is on the boundary of acceptable, but we will look if we

decrease the rate by tuning our RF model. The class error of non-information is very good with a class

error rate of 3%, but the class error rate is too high for the prediction of information (74%). 51 messages

were correctly classified as information (TP), while 143 messages were incorrectly classified as non-

information (FN). The accuracy of the test data is equal to 83% which confirms the good AUC of our

model. In order to understand which vectors of the SVD drives the results, we looked at the variable

importances which are analysed by the MDG and the MDA. For both metrics v1,v10 and v32 are the

most important variables (vectors) of our model (Exhibit 5). As mentioned before, this is less useful to

our thesis since we focus on the performance of the different classification approaches. Secondly, we

have reduced the number of words to vectors by making use of SVD, which has less useful meaning.

50

Tuning of the model did not improve the OOB error rate. As a result, we will keep the original prediction

model with 1001 trees.

14.3.2 Evaluation supervised approaches In what follows, we will evaluate and compare the different supervised classification approaches. As

mentioned before, we look at the AUC values (Table 15) and at the AUROC-curves (Figure 11) of each

classification model, since this is the most important metric used for checking the performance of

classification labelling. The AUC for the information variable and the transaction/entertainment variable

from the Base Content Approach are quite good, 0.82 and 0.72 respectively. If we look at the Content

Approach, who separated the transaction and entertainment category compared to the BCA, we get a

AUC of 0.87 for the entertainment model and a AUC of 0.83 for the transaction model, which is very

good. The AUC of the information variable of the CA is 0.78 which is a little bit less compared to AUC

of information of the BCA. The little deviation is due the fact that for each variable model a new random

training and test set was made. Splitting transaction/entertainment into transaction and entertainment

have made them become a better prediction variable (Figure 11(a) and Figure 11(b)). The AUC curve

of entertainment and transaction of the CA has become higher compared to information while the

information AUC curve was higher than the transaction/entertainment variable in BCA. Table 15: AUC's of the different classification variables

If we look at the AUC values of the Message Strategy Approach who classifies message posts into

functional (AUC = 0.53), experiential (AUC = 0.93), emotional (0.72) and brand resonance (0.86), we

see a low value for the functional variable. As mentioned before, a value lower than 0.5 can point out

that the model may be overfitted. Out of the 1000 posts who were manually coded, only 22 posts where

coded as functional. So the low AUC can be due to the low values in our training data which made our

model too strict to classify messages into the functional category. Secondly, the AUC curve of the

functional model sometimes goes under the straight line which means that a randomly chosen

classification would perform better than our model (Figure 11(c)). Thirdly, as stated in the human

Variable AUC

MDO_INF 0,82 MD0_TRA.ENT 0,72

MD1_INF 0,78 MD1_ENT 0,87 MD1_TRA 0,83

MD2_FUN 0,53 MD2_EXP 0,93 MD2_EMO 0,72 MD2_BRA 0,86

MD3_TAS 0,83 MD3_REL 0,77 MD3_SEL 0,91


Content Approach



51

classification section, we mentioned that the functional variable is not the best suitable variable to use

on our data coming from bars and restaurants, which is confirmed by the low AUC. Besides the low

performance of functional, the other variables perform very well. Even the model for brand resonance,

which only consisted out of 6% (57 posts) of the labelled dataset, has a good prediction performance

(AUC = 0.86). Our last approach (Marketeer’s Orientation Approach) is also a good approach for

message classification since the AUC’s are quite high (table 15). Self-oriented posts, who only occurred

in 8% of the labelled data, have the best classification performance followed by relationship/interaction-

oriented messages and task-oriented messages (Figure 11(d)).

Figure 11: AUC-curves of the different classification approaches

52

14.4 Basic impact approach 14.4.1 Model evaluation Table 16 gives an overview of the evaluation parameter (R squared for OLSR and AIC for NBR),

concerning the fit for all the different models. The scope of R square ranges between 1.20% and 2.74%

for the OLSR models. The variation explained in Y isn’t explained by the variation of the IV’s. At first

sight, the low values look very concerned. However, we have to take certain elements into account

concerning the low R square. In some fields, a lower R-squared value is more expected compared to

other fields, especially in the field of our study where we want to predict human behaviour (‘How to

Interpret a Regression Model with Low R-squared and Low P values’, 2014). Predicting whether a

person would like a post or not is more difficult than making predictions in the “pure science” field,

where predictions need to have a high degree of accuracy. BPP also depends on other factors (state of

mind, social pressure, personal interest etc.) than only the categorization type of the post, the number of

words and whether the post was in the weekend or not. We can still draw important conclusions since

some predictor values are statistically significant. We can explain how changes in the DV’s are

associated (positive, negative or not related) with BPP. It is not worth mentioning the precise prediction

effect of a specific message, since the spread of the data points around the predicted mean of the

regression line is quite high.

Table 16: Evaluation of the different approaches

Secondly, we looked at the different AIC’s value of the NBR models. Looking at the AIC value of one

model does not have much value on its own, compared to the R squared value of a linear regression

model. AIC is used to compare different models for the same data and the same DV. The model with

the lowest AIC, and so the lowest loss of information, is stated as the “best” model. Models with a higher

AIC parameter are perceived as more complex models (‘NEGATIVE BINOMIAL REGRESSION |

STATA DATA ANALYSIS EXAMPLES’, n.d.). We also added the Nagelkerke R squared parameter

as an extra check, which compares the likelihood of the full model compared to the likelihood of only

an intercept model (Mangiatfico, 2019). Based on the two parameters of fit for OLSR and NBR (and the

confirmation of the Nagelkerke parameter), the MAM scores the best, followed by the topic approach

model and the CA model. The AIC value of the reactions model is also higher compared to the other

OLSR OLSR OLSRadj. R^2 NR^2 AIC adj. R^2 NR^2 AIC adj. R^2 NR^2 AIC

2,41% 2,41% 1.324.565 2,74% 2,91% 569.160 2,51% 4,27% 459.845

Topic Model * 2,23% 2,19% 1.325.087 2,43% 2,15% 570.905 2,02% 3,19% 462.134Base Content Approach * 1,60% 0,71% 1.328.659 0,02% 1,67% 571.890 1,24% 2,02% 464.584Content Approach * 1,65% 0,84% 1.328.347 2,37% 2,15% 570.840 1,55% 2,30% 464.001Message Strategy Approach * 1,70% 0,51% 1.329.148 2,05% 1,53% 572.216 1,20% 1,26% 466.172Marketeer's Orientation Approach * 1,69% 0,54% 1.329.084 2,07% 1,62% 572.015 1,39% 1,50% 465.684

*Each approach also takes the 2 control variables into the model (number of words & weekendday)adj.R^2 for OLSR, AIC & Nagelkerke R^2 for NBR

NBR NBR NBR

Isolated models

Reactions Shares Comments

Multi Approaches model *

53

models (Table 17). Since we are dealing with low and small differentiations of R squared values, we

have to be careful by making exact predictions.

14.4.2 Estimation results The estimation results of the MAM are given in Table 17. The estimation results of the IM’s is given in

Table 18 through 22. As shown in the tables, all the 36 models are significant as a whole (p < 0.001). In

addition, different effects have been found for the categorization types with BPP. In what follows, we

will check the relationship of each categorization with the number of reactions, the number of shares

and the number of comments. Our main analysis is based on the model with all the categorization

variables taken into account. Still, we also look at how each isolated classification approach is

interrelated with BPP on its own.

Table 17: Estimation Results for Brand Post Popularity, Multi Approaches Model

OLSR NBR OLSR NBR OLSR NBR

(Intercept) 0,960 ** 1,715 ** 0,252 ** 0,168 ** 0,162 ** -0,958 **Topic approach Party -0,165 ** -0,018 -0,167 ** -0,634 ** -0,069 ** -0,028 Food -0,111 ** 0,174 ** -0,164 ** -0,753 ** -0,057 ** 0,086 * Performance 0,039 * 0,466 ** -0,130 ** -0,224 ** 0,050 ** 0,710 **Base Content approach Information 0,065 ** 0,087 ** -0,016 * -0,057 * 0,023 ** 0,081 ** Transaction/ Entertainment -0,023 ** 0,024 * -0,021 ** 0,095 ** 0,144 ** 0,267 **Approach 1: Content Information x x x x x x Entertainment xx 0,099 ** 0,034 ** 0,313 ** -0,015 ** 0,146 ** Transaction 0,109 ** 0,363 ** 0,125 ** 0,827 ** 0,144 ** 1,026 **Approach 2: Message Strategy Functional 0,613 * -0,255 0,586 ** 1,064 xx 0,011 Experiential xx 0,092 ** 0,041 ** 0,229 ** 0,030 ** 0,142 ** Emotional 0,597 ** 0,287 * xx -0,078 0,178 ** 0,061 Brand Resonance 0,304 ** 0,253 * xx -0,048 0,123 * 0,561 *Approach 3: Marketer's Orientation Task-oriented -0,028 * -0,096 ** -0,017 * -0,143 ** -0,042 ** -0,139 ** Relationship/ Interaction-oriented 0,137 ** 0,258 ** 0,131 ** 0,496 ** 0,222 ** 0,770 ** Self-oriented 0,233 ** 0,236 ** xx -0,164 0,068 ** 0,177Control variables Number of words *1 0,124 ** 0,001 ** 0,083 ** 0,004 0,045 ** 0,002 ** Weekend day 0,011 * -0,053 ** -0,061 ** -0,315 -0,015 ** -0,078 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 2,41% / 2,74% / 2,51% / AIC / 1.324.565 / 569.160 / 459.845

Shares (B)Reactions (B) Comments (B)Multi Approaches Model

*2: OLSR: F-statistic, NBR: Chi square

Unstandardized coefficients are reported in the table

xx Variable removed after stepwise variable selection

Dependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)

x Information is left out of the model to reduce overlap with the information variable from the base approach

54

Table 18: Estimation Results for Brand Post Popularity, Topic Approach

Table 19: Estimation Results for Brand Post Popularity, Base Content Approach


(Intercept) 0,966 ** 1,726 ** 0,246 ** 0,171 ** 0,166 ** -0,915 ** Party -0,152 ** 0,076 ** -0,184 ** -0,368 ** -0,050 ** 0,237 ** Food -0,090 ** 0,262 ** -0,198 ** -0,557 ** -0,025 ** 0,339 ** Performance 0,067 ** 0,563 ** -0,166 ** 0,024 0,081 ** 1,018 **Control variables Number of words *1 0,110 ** 0,001 ** 0,104 ** 0,009 ** 0,038 ** 0,005 ** Weekend day 0,012 * -0,057 ** -0,062 ** -0,329 ** -0,015 ** -0,107 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 2,23% / 2,43% / 2,02% / AIC / 1.325.087 / 570.905 / 462.134

*1: OLSR: number of words is replaced by log(number of words + 1)*2: OLSR: F-statistic, NBR: Chi square

Topic ApproachReactions (B) Shares (B) Comments (B)

Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)*p < 0,05, ** p < 0,001


(Intercept) 0,920 ** 1,834 ** 0,224 ** 0,046 ** 0,142 ** -0,822 ** Information 0,110 ** 0,086 ** -0,067 ** -0,335 ** 0,054 ** 0,020

Transaction/ Entertainment -0,028 ** 0,153 ** -0,035 ** 0,089 ** 0,025 ** 0,544 **Control variables Number of words *1 0,109 ** 0,003 ** 0,068 ** 0,008 ** 0,040 ** 0,008 ** Weekend day 0,009 * -0,055 ** -0,062 ** -0,322 ** -0,016 ** -0,087 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,60% / 2,31% / 1,24% / AIC / 1.328.659 / 571.890 / 464.584

*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)*2: OLSR: F-statistic, NBR: Chi square

Base Content ApproachReactions (B) Shares (B) Comments (B)

Unstandardized coefficients are reported in the tableDependent variable of OLSR is log(dependent+1), dependent variable for NBR is log(dependent)

55

Table 20: Estimation Results for Brand Post Popularity, Content Approach

Table 21: Estimation Results for Brand Post Popularity, Message Strategy Approach


(Intercept) 0,923 ** 1,819 ** 0,219 ** 0,045 ** 0,152 ** -0,689 ** Information 0,058 ** 0,256 ** -0,074 ** -0,228 ** 0,036 ** 0,428 ** Entertainment -0,069 ** 0,153 ** 0,010 * 0,085 ** -0,059 ** 0,226 ** Transaction 0,099 ** 0,412 ** 0,129 ** 0,951 ** 0,156 ** 1,289 **Control variables Number of words *1 0,119 ** 0,003 ** 0,058 ** 0,006 ** 0,054 ** 0,006 ** Weekend day 0,011 * -0,052 ** -0,061 ** -0,302 ** -0,016 ** -0,061 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,65% / 2,37% / 1,55% / AIC / 1.328.347 / 570.840 / 464.001


*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)

Content ApproachReactions (B) Shares (B) Comments (B)



(Intercept) 0,943 ** 1,902 ** 0,197 ** -0,039 ** 0,163 ** -0,576 ** Functional 0,719 * -0,007 0,643 ** 2,109 ** 0,253 * 0,427 Experiential -0,128 ** 0,092 ** 0,036 ** 0,105 ** -0,068 ** 0,184 ** Emotional 0,691 ** 0,506 ** 0,120 * 0,033 0,224 ** 0,241 Brand Resonance 0,486 ** 0,517 ** -0,072 -0,385 * 0,186 ** 0,789 **Control variables Number of words *1 0,134 ** 0,003 ** 0,055 ** 0,009 ** 0,059 ** 0,009 ** Weekend day 0,009 -0,056 ** -0,062 ** -0,322 ** -0,017 ** -0,094 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,70% / 2,05% / 1,20% / AIC / 1.329.148 / 572.216 / 466.172


*p < 0,05, ** p < 0,001*1: OLSR: number of words is replaced by log(number of words + 1)

Message Strategy ApproachReactions (B) Shares (B) Comments (B)


56

Table 22: Estimation Results for Brand Post Popularity, Marketeer's Orientation Approach

An extra test was needed for the NBR model to check if each categorization variable is statistically

significant on its own. We could do this by comparing the MAM with and without the specific

categorization variable by conducting an ANOVA test. Previous research from Cvijikj & Michahelles

(2013) also made use of this extra test. The main difference between them and our model is, that overlap

is allowed within each approach over the categorization variables (e.g. a Facebook post can be classified

as information and transaction), while this is was allowed in the study from Cvijikj & Michahelles

(2013) (e.g. a message is classified as entertainment, information or remuneration as content type).

Conducting the ANOVA is less valuable in our study as we consider every categorization variable as a

factor variable on its own (0 or 1), while the variable content type from Cvijikj & Michahelles (2013)

was a factor variable with multiple possibilities. All categorization variables as well as the two control

variables were found to be a significant factor for all types of BPP (p < 0.0001). In what follows, the

effects of the explanatory variables are explained in relation to the number of reactions, shares and likes.

We will mention the results from the OLSR and the NBR model of the MAM and only mention the

results from the related isolated approach when results derive from the MAM. So not mentioning the

effects of the Isolated Approach, means the effect is confirmed by the specific IM approach on its own.

An overview of the estimated results can be found in table 18, 19, 20, 21 and 22.

14.4.3 Number of reactions Topic Approach

Party was found to be a significant factor for the OLSR model which has a negative relationship with

the number of reactions ()*+,-(/0/),34567 = -0.165, p < 0.001), while party was insignificantly related

with the number of reactions for the NBR model ()89-(/0/),34567 = -0.018, p > 0.05). This is in


(Intercept) 0,938 ** 1,913 ** 0,197 ** -0,026 * 0,160 ** -0,546 ** Task-oriented -0,122 ** 0,070 ** 0,037 ** 0,092 ** -0,077 ** 0,134 ** Relationship/ Interaction-oriented 0,126 ** 0,457 ** 0,174 ** 1,054 ** 0,256 ** 1,565 ** Self-oriented 0,376 ** 0,511 ** x -0,314 ** 0,128 ** 0,583 **Control variables Number of words *1 0,133 ** 0,003 ** 0,055 ** 0,008 ** 0,061 ** 0,008 ** Weekend day 0,010 -0,057 ** -0,062 ** -0,320 ** -0,017 ** -0,091 **Performance Model sign. *2 ** ** ** ** ** ** Adj. R ^2 1,69% / 2,07% / 1,39% / AIC / 1.329.084 / 572.015 / 465.684


*p < 0,05, ** p < 0,001*** OLSR: number of words is replaced by log(number of words + 1)x Variable removed after stepwise variable selection

Marketeer's Orientation ApproachReactions (B) Shares (B) Comments (B)


57

contrast with the isolated topic model where party was found as a significant and positive effect over

the number of reactions ()89-(:/),34567 = 0.076, p < 0.001). Providing content about food is significant

and negatively related to the number of reactions for the OLSR multi approach model ()*+,-(/0/),;<<=

= -0.111, p < 0.001), but the NBR model found a significant positive relationship between them

()89-(/0/),;<<= = 0.174, p < 0.001). The last topic (performance) was found to be significantly related

with the number of reactions ()*+,-(/0/),3>5;<5?4@A> = 0.039, p < 0.05; )89-(/0/),3>5;<5?4@A> =

0.466 p < 0.001). This effect is confirmed by the isolated topic model where the significance was even

stronger ()*+,-(:/),3>5;<5?4@A> = 0.067, p < 0.001).


If we look at the BCA, providing information is significant and positively related to the number of

reactions ()*+,-(/0/),B@;<5?46B<@ = 0.065 , p < 0.001;)89-(/0/),B@;<5?46B<@ = 0.087, p < 0.001). The

estimated results for OLSR and NBR are in contradiction to each other. While messages about

transaction/ entertainment are significant and negatively related to the number of reactions for the OLSR

method ()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = -0.023, p < 0.001), the NBR model found a

marginally positive association between transaction/ entertainment and the number of reactions

()89-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = 0.024, p < 0.05). This marginal effect was confirmed by the

isolated NBR model with even a lower p-value ()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = 0.153, p <

0.001).

Content Approach

The CA splits the transaction/ entertainment variable from the BCA into two separated variables. As

mentioned before, to restrict against overlap between categorization variables, we left information of

the CA out of the MAM. However, we still looked at how informative posts behave in the isolated

content model. A positive relationship was found between informative messages and the number of

reactions ()*+,-(:/),B@;<5?46B<@ = 0.058 p < 0.001; )89-(:/),B@;<5?46B<@ = 0.256, p < 0.001).

Entertainment was proven to be insignificant for the OLSR model and was removed out of the model,

after stepwise selection of the variables of the model. In contrast, entertainment was found to be

significant and positively related to the number of reactions ()89-(/0/),>@6>564B@?>@6 = 0.099, p <

0.001). If we look at the isolated model, entertainment was found to be significant and negatively

associated with the number of reactions based on the OLSR method ()89-(:/),>@6>564B@?>@6 = 0.069, p

< 0.001). A significant positive association was found between the transaction and the number of

reactions. ()*+,-(/0/),654@D4A6B<@= 0.109, p < 0.001; )89-(/0/),654@D4A6B<@ = 0.363 p < 0,001).


Posts providing functional content are marginally significant and positively related with the number of

reactions based on the OSLR method ()*+,-(/0/),;E@A6B<@4F = 0.613, p < 0.05). According to the NBR

58

method, no relationship was found between functional and the number of reactions

()89-(/0/),;E@A6B<@4F = -0.255, p > 0.05). While posts who stimulate behavioural responses

(experiential) were left out of the OLSR model after stepwise variable selection, the NBR model found

a significant and positive relation between experiential and the number of reactions

()89-(/0/),>G3>5B>@6B4F = 0.092, p < 0.001). In contrast with the MAM where experiential was left out

of the model, the isolated model found a significant negative association with the number of reactions.

()*+,-(:/),>G3>5B>@6B4F = -0.128, p < 0.001). Posts who are emotionally related were found positively

related with the number of reactions ()*+,-(/0/),>?<6B<@4F = 0.597 , p < 0.001; )89-(/0/),>?<6B<@4F =

0.287 p < 0.05). The same results can be found for the relationship between brand resonance and the

number of reactions ()*+,-(/0/),H54@=5>D<@4@A> = 0.304 , p < 0.001; )89-(/0/),H54@=5>D<@4@A> =

0.253 p < 0.05). The isolated model confirmed the effect of emotional and brand resonance on the

number of reactions but the unstandardized values were also significant with a p-value lower than 0.001

for the NBR model. ()89-(:/),>?<6B<@4F = 0.506, p < 0,001; )89-(:/),H54@=5>D<@4A> = 0.517, p < 0.001).

Marketeer’s Orientation Approach

Relationship/ interaction-oriented was found significant and positively related to the number of reactions

()*+,-(/0/),5>F46B<@DIB3 = 0.137, p < 0,001; )89-(/0/),5>F46B<@DIB3 = 0.258, p < 0.001). The same

results were found for posts who are self-oriented ()*+,-(/0/),D>F; = 0.233, p < 0.001; )89-(/0/),D>F;

= 0.0.236, p < 0.001). Different results were found for posts who are task-oriented (e.g. advertising,

coupons, discounts etc.). Task-oriented was found marginally significant and negatively associated with

the number of reactions. ()*+,-(/0/),64DJ = -0.028, p < 0.05). The isolated OLSR approach confirmed

this effect on a lower significance degree ()*+,-(:/),D>F; = -0.122, p < 0,001). While the NBR (MAM)

model also found a significant and negative association between task-oriented posts and the number of

reactions ()89-(/0/),64DJ = -0.096, p < 0.001), the isolated NBR model found a significant and positive

relation between task-oriented and the number of reactions . ()89-(:/),64DJ = 0.070, p < 0.001).

Control variables

In terms of the control variables, the number of words was found as a significant and positively related

factor to the number of reactions ()89-(/0/),@E?H>5<;=47D = 0.124, p < 0.001;

)89-(/0/),@E?H>5<;=47D = 0.001, p < 0.001). A Facebook posted on Saturday or Sunday was found

significant and negatively related to the number of reactions for the NBR model ()89-(/0/),K>>J>@==47

= -0.053, p < 0.001). However, the OLSR model found a marginally significant positive effect of

weekend days on the number of reactions ()*+,-(/0/),K>>J>@==47 = 0.011, p < 0.05). These effects

were confirmed by the isolated approaches. Only for the MSA and the marketeer’s orientation approach

based on the OLSR model, no significant effect was found between weekend day and the number of

reactions ()*+,-(:/),K>>J>@==47 = 0.009, p > 0.05; ()89-(:/),K>>J>@==47 = 0.010, p > 0.05).

59

14.4.4 Number of shares Topic Approach

Messages who were assigned to one of the 3 topics (party, food & performance) are significant and

negatively related to the number of shares. The respectively unstandardized coefficients for party, food

and performance are all significant with a p-value lower than 0.001 for the OLSR model as well as the

NBR model. Only content about music & concerts (performance topic) was insignificant with the

number of shares for the isolated model which made use of NBR ()*+,-(:/),3>5;<5?4@A> = 0.024, p <

0.001).


Information is negative and marginally significantly related to the number of shares

()*+,-(/0/),B@;<5?46B<@ = -0.016 , p < 0.05; )89-(/0/),B@;<5?46B<@ = -0.057 p < 0.05). This effect is

confirmed and reinforced if we look at the isolated BCA ()*+,-(:/),B@;<5?46B<@ = -0.067 , p < 0.001;

)89-(:/),B@;<5?46B<@ = -0.335 p < 0.001). The outcomes for the 2 models (OLSR & NBR) are conflicting

each other. While OLSR found a significant negative association between transaction/ entertainment &

the number of shares for a post, NBR discovered significant positive effect with the number of shares

for a transaction/ entertainment message ()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = -0.021 , p < 0.001;

)89-(/0/),654@D4A6B<@&>@6>564B@?>@6 = -0.095 p < 0.001).

Content Approach

Whether a BP is entertainment related has a significant positive influence on the number of shares

()*+,-(/0/),>@6>564B@?>@6 = 0.034 , p < 0.001; )89-(/0/),>@6>564B@?>@6 = 0.313, p < 0.001). The same

conclusion could be made for transaction oriented messages ()*+,-(/0/),654@D4A6B<@ = 0.125, p < 0.001;

)89-(/0/),654@D4A6B<@ = 0.827, p < 0.001). If we look at the isolated content model which only takes

information, entertainment and transaction into the model as well as the two control variables (number

of words & weekend day), information posts were found significant negatively related to the number of

shares ()*+,-(:/),B@;<5?46B<@ = -0.074 , p < 0.001; )89-(:/),B@;<5?46B<@ = - 0.228, p < 0.001). Secondly,

the entertainment character was also found positively related to the number of shares (OSLR model) but

on a marginally significance level ()*+,-(:/),>@6>564B@?>@6 = 0.010, p < 0.05).


Messages concerning functional content are significant and positively related to the number of shares as

for the OLSR method ()*+,-(/0/),;E@A6B<@4F = 0.586, p < 0.001), while no significant association was

found for the NBR model ()89-(/0/),;E@A6B<@4F = 1.064, p > 0.05). In contrast, the isolated MSA which

made use of NBR found a significant and positive relation between the functional and the number of

shares ()89-(:/),;E@A6B<@4F = 2.109, p < 0.001). Experiential was found to be a significant positive factor

for the number of shares ()89-(/0/),>G3>5B>@6B4F = 0.041, p < 0.001;()89-(/0/),>G3>5B>@6B4F = 0.229,

60

p < 0.001). Posts providing emotional and brand resonance related content where left out of the OLSR

(MAM) model after stepwise selection of the variables. This is in line with the results from the NBR

(MAM) model where emotional and brand resonance were found insignificant. ()89-(/0/),>?<6B<@4F =

-0.078, p > 0.05; )89-(/0/),H54@=5>D<@4@A> = -0.048 p < 0.05). If we look at the isolated model of the

MSA, emotional BP’s were found marginally significant positively related to the number of shares

concerning the OLSR model ()*+,-(:/),>?<6B<@4F = 0.120, p < 0.05), while the NBR method found no

significant relationship between them ()89-(:/),>?<6B<@4F = 0.033, p > 0.05). Alternatively, a message

posting about the brand images and histories was found marginally significant and negatively related to

the number of shares concerning the NBR method()89-(:/),H54@=5>D<@4@A> = -0.385, p < 0.05), while

the OLSR method found no significant association between brand resonance and the number of shares

()*+,-(:/),H54@=5>D<@4@A> = -0.072, p > 0.05).


Messages who provide content with the focus on increasing the interactivity between the followers and

the brand page were found significant and positively associated with the number of shares

()*+,-(/0/),5>F46B@DIB3 = -0.078, p < 0.001; )89-(/0/),5>F46B<@DIB3 = -0.048 p < 0.001). Self-oriented

posts were removed out of the MAM and isolated model after stepwise reduction for OLSR. But if we

look at the NBR model, no significant relation was found between self-oriented and the number of shares

for the MAM model ()89-(/0/),D>F; = -0.164, p > 0.05), while the isolated model found a significant

and negative relation with the number of shares ()89-(:/),D>F; = -0.314, p < 0.001). Task-oriented is

found marginally and significantly related to the number of shares concerning the OLSR model

()*+,-(/0/),64DJ = -0.017, p < 0.05). The NBR model confirmed this effect with even a lower

signification level ()89-(/0/),64DJ = -0.143, p < 0.001). However, the isolated model found a

significant and positive association with posts providing task-oriented content and the number of shares

()*+,-(:/),64DJ = 0.037, p < 0.001; )89-(:/),64DJ = 0.092 p < 0.001).

Control variables

Posting messages during the weekend was found as a significant and negatively related factor to the

number of shares ()*+,-(/0/),K>>J>@==47 = 0.083, p < 0.001; )89-(/0/),K>>J>@==47 = 0.004 p <

0.001). On the opposite, the number of words of a message was found significant and positive associated

with the number of shares ()*+,-(/0/),@E?H>5<;K<5=D = -0.061, p < 0.001; )89-(/0/),@E?H>5<;K<5=D

= -0.315 p < 0.001).

14.4.5 Number of comments Topic Approach

Party related messages are significant and negatively related to the number of comments based on the

OLSR model ()*+,-(/0/),34567 = 0.069, p < 0.001) while party is insignificant associated with

61

comments according to the NBR method ()89-(/0/),34567 = -0.028, p > 0.05). However, party was

found significantly positively related to the number of comments according to isolated NBR model

()89-(:/),34567 = 0,237, p < 0,001), while the OLSR isolated model confirmed the significant negative

relationship of the results from OLSR with the number of comments ()*+,-(:/),34567 = -0.05, p < 0.001).

Food is significantly and negatively related to the number of comments ()*+,-(/0/),;<<= = -0.057, p

< 0.001) according to the OLSR model, while food is marginally positive associated with the number of

comments based on the NBR model ()89-(/0/),;<<= = 0.086, p < 0.05). The isolated model confirmed

this positive relationship between food and comment with even a lower p-value ()89-(:/),;<<= = 0.339

p < 0.001). Providing posts related to performance was significant positive associated with the number

of comments on the message for the OLSR model as well as the NBR model ()*+,-(/0/),3>5;<5?4@A>

= 0,050, p < 0,001; )89-(/0/),3>5;<5?4@A> = 0.710, p < 0.001).


Providing information related content in Facebook posts is significant and positively related to the

number of comments ()*+,-(/0/),B@;<5?46B<@ = 0.023, p < 0.001; )89-(/0/),B@;<5?46B<@ = 0.081, p <

0.001). The same estimated results were found if we look at transaction/ entertainment posts

()*+,-(/0/),654@D4A6B<@<5>@6>564B@?>@6 = 0.144, p < 0.001; )89-(/0/),654@D4A6B<@<5>@6>564B@?>@6 =

0.267, p < 0.001). Compared to the isolated BCA, one difference was found. No relationship was found

between transaction/ entertainment and the number of comments according the NBR model

()89-(:/),654@D4A6B<@<5>@6>564B@?>@6 = 0.020, p > 0.05).

Content Approach

Information is significant and positively related to the number of comments if we look at the isolated

model. )89-(:/),B@;<5?46B<@ = 0.036, p < 0.001;)*+,-(:/),B@;<5?46B<@ = 0.428, p < 0.001). The results

from the OLSR method and the NBR for entertainment posts contradict each other. While the OLSR

method found a significant negative relation with the number of comments ()*+,-(/0/),>@6>564B@?>@6

= -0.015, p < 0.001), the NBR model found a positive relation between entertainment and the number

of comments ()89-(/0/),>@6>564B@?>@6 = 0.146, p < 0.001). Posts talking about sweepstakes, bonuses,

promotions etc. (transaction content) are significant positive associated with the number of comments

()*+,-(/0/),654@D4A6B<@ = 0.144, p < 0.001; )89-(/0/),654@D4A6B<@ = 1.026, p < 0.001).


Functional was left out of the OLSR model after stepwise variable selection. In line, no significant

association was found with functional and the number of comments based on the NBR method

()89-(/0/),;E@A6B<@4F = 0.011, p > 0.05). Still, the isolated model found a marginally significant

positive association (OLSR procedure) between functional and the number of comments

()*+,-(:/),;E@A6B<@4F = 0.253, p < 0.05). BP’s concerning sensory stimulation, physical stimulation or

62

brand events were found significant and positively related to the number of comments

()*+,-(/0/),>G3>5B>@6B4F = 0.146, p < 0.001;)89-(/0/),>G3>5B>@6B4F = 0.142, p < 0.001). In contrast, the

isolated OLSR model for MSA found a significant and negative association between experiential and

the number of comments ()*+,-(:/),>G3>5B>@6B4F = -0.068, p < 0.001). Emotion-laden message are

significant and positively related to the number of comments concerning the OLSR model

()*+,-(/0/),>?<6B<@4F = 0.178, p < 0.001), while the NBR model found no significant relation

()89-(/0/),>?<6B<@4F = 0.061, p > 0.05). Posts providing brand resonance related content are marginally

significant and positively linked with the number of comments ()*+,-(/0/),H54@=5>D<@4@A> = 0.123, p

< 0.05; )89-(/0/),H54@=5>D<@4@A> = 0.561, p < 0.05). The isolated model confirmed the positive effect

between brand resonance and the number of comments on a lower signification level

()*+,-(:/),H54@=5>D<@4@A> = 0.186, p < 0.001; )89-(:/),H54@=5>D<@4@A> = 0.789, p < 0.001).


A negative and significant association was found between task-oriented and the number of comments

()*+,-(/0/),64DJ = -0.042, p < 0.001; )89-(/0/),64DJ = -0.139, p < 0.001). contrary, a significant

positive relation was found with task-oriented messages and the number of comments for the NBR

model ()89-(:/),64DJ = 0.134, p < 0.001). Relationship/ interaction-oriented was found significant and

positively related to the number of comments ()*+,-(/0/),5>F46B<@DIB3 = 0.222, p < 0.001;

)89-(/0/),5>F46B<@DIB3 = 0.770, p < 0.001). Where the OLSR model found a significant and positively

association between self-oriented and the number of comments ()*+,-(:/),D>F; = 0.068, p < 0.001), no

significant relation was found between them concerning the NBR model ()89-(/0/),D>F; = 0.177, p <

0.001). In contrast to insignificant relation which was found for the MAM (NBR) model, the isolated

NBR model found a positive and significantly relation between self-oriented posts and the number of

comments ()89-(:/),D>F; = 0.583, p < 0.001)

Control variables

If we look at how the control variables behave in relation to the number of comments, we found the

same results as for the number of shares. The number of words are significant and positively associated

to the number of comments ()*+,-(/0/),@E?H>5<;K<5=D = 0.045, p < 0.001;

)89-(/0/),@E?H>5<;K<5=D = 0.002, p < 0.001), while a post who is created during the week is

significant and negatively related to the number of comments ()*+,-(/0/),K>>J>@==47 = -0.015, p <

0.001; )89-(/0/),K>>J>@==47 = -0.078, p < 0.001).

63

15 Discussion and managerial implications 15.1 Selection of Classification approach This section gives a more comprehensive discussion on the evaluation results of the performance of the

topic model and the supervised models. Our first content model made use of the unsupervised topic

modelling approach. Our three obtained topics from the models made sense (party, food &

performance). So overall, we can conclude topic modelling is a well-established classification approach

on its own. Topic model highly dependent on the characteristics of the data. Applying the approach to

another industry, will result in other overarching topics. For this reason is it also difficult to compare the

approach with the results of the supervised approaches. Marketeers can use this approach to see which

themes and topics are mainly dominated on their brand page.

If we look at the evaluation of the supervised approaches we can make the following conclusions. First

of all, splitting the transaction/entertainment variable from the Base Content Approach into transaction

and entertainment increased the classification performance of the two variables. Managers who want to

take the content classification into account, should make use of the CA instead of the BCA. Secondly,

the functional value had a bad prediction accuracy compared to the other variables of the Message

Strategy Approach. This is mainly due to the fact that our data is coming from the bar industry and

functional posts appear less in this environment. This is also why the experiential categorization was

dominated by the MSA. We would recommend managers to use this approach if the characteristics of

the industry they are working in permit it (e.g. a manufacturing company is more applicable to use this

categorization since post mentioning functional claims are more commonly used than in the food or

drink industry). The last approach (MAO) takes more the mindset and strategy of the marketeers as a

starting point, focusing less on the content, while the MSA fits closer to the Content Approach since it

focuses on the strategy of the messages itself. Managers who want to classify their message posts on a

higher strategy, lower content level, can make use of the effective MOA. Overall, all of the 4 approaches

are well practicable in automatic content classification. The preferable approach to apply in practice can

be summarized by the following two factors.

(1) Industry: The best suitable approach mainly depends on the industry of the company. In our case,

it is clear that more informal categorization variables (e.g. entertainment & experiential) perform better

to more formal classification types (e.g. functional), due to the characteristics of bars & restaurants.

Posts are more informal to their customers compared to Volvo who would post about a new feature

online. (2) Preferred viewpoint: Do the managers & marketeers want to classify posts by taking the

content as a starting point (CA or BCA) or do they prefer to classify posts by taking their own perceived

scheme of online engagement with customers as a starting point (MOA)? A third possible approach

takes the strategy of different messages as a starting point (MSA).

64

15.2 Enhancing Brand post popularity The results of the predictive impact analysis have shown that not all categorization types have a

significant effect on the number of reactions, shares or comments. Secondly, the variables from the

different classification approaches have shown different effects on BPP. In what follows, we give a more

practical explanation on how marketeers or managers can use this research as a guideline for increasing

online engagement. Exhibit 6 explains the method we used to decide which variables are worth

mentioning in this section. This is due to the fact that some estimated results were in contradiction to

each other (e.g. the topic was found negative related to the number of reactions for the OLSR model,

while the NBR model found a positive association between the topic and the number of reactions). A

remark to keep in mind is that our results are derived from the bar and restaurant industry. Possible

derivations of relationship between categorization variables and BBP can occur over different sectors.

Secondly the low model evaluation parameter R squared made it not possible to make exact predictions.

15.2.1 Enhancing the number of reactions Managers who want to enhance BPP in the form of increasing the number of reactions (likes), should

post content about concerts & music (performance topic). This can be due to the fact that people feel

happier when one of their favourite bars organizes a concert they do not want to miss. Moderators

focusing on the content of the messages, should post information or transaction related content to

increase the number of reactions. A possible explanation for this positive association between

information and the number of reactions, is that followers are satisfied with the extra information the

bar or restaurant is posting about (e.g. information about the opening hours). Secondly, when marketeers

post transaction related content (sweepstakes, deals, bonuses, discounts etc.), they sometimes force

followers to like the post in order to make use of the promotion mentioned. Managers who focus on the

MSA for content classification, should post messages who are emotional & brand resonance related to

increase the number of reactions. Probably, customers feel more connected to emotional laden posts,

which increase the probability that they will like the post. Additionally, to increase the number of

reactions, marketeers should post self-oriented or relationship/interaction-oriented content. If we look

at interaction, the result is quite intuitive since these posts contain votes and contests or ask for feedback

(e.g. you have to like the post to enter the contest). It is beneficial for the number of reactions, to post

during the week. This might be due to the fact that people are more busy during the weekends with a

fully booked schedule, while they probably have more “me time” during the week after a long day of

work to check SM. Our research further indicates that longer messages have a positive impact on the

number of reactions.

65

15.2.2 Enhancing the number of shares Managers who want to enhance online engagement by increasing the number of shares should not focus

on the topic of the messages. Party, food and performance are negative related to the number of shares.

This might be explained by the fact that topics are formed by an unsupervised approach which focuses

on forming overarching subjects of the messages. So, these topics do not focus on pre-formed

categorization variables which are sometimes more outcome-oriented and less content-oriented (e.g. the

goal of the relationship/interaction-oriented variable from the MOA is to increase the interaction, which

translates itself automatically in increasing BPP). Compared to the positive effect of information with

the number of reactions and comments, it has a negative effect on the number of shares. From personal

experience, if you want to show an interesting information post to one of your friends, you will prefer

to tag him in the comments over sharing it to all of your followers on your personal page. Furthermore,

managers should post entertainment and transaction related content to increase the number of shares. If

we look at the MSA, emotional and brand resonance are unrelated to the number of shares while the two

variables were positive related to the number of reactions. A possible explanation could be that posts

who are more informal (emotional & brand resonance) trigger more likes compared to shares since

customers feel more personally connected to these kind of posts (e.g. a follower will feel more plausible

to like a post over an emotional goodbye of one of the bartenders compared to the functional explanation

of a product). On the other hand, functional and experiential are positive related to the number of shares.

Probably, experiential posts are more shared since they want to encourage physical & sensory

stimulations. In the bar industry, these types of posts are most of the time combined with a little contest

or sweepstake which stimulates the people to share a post and to win a price. A post focusing on this

relationship/interaction-oriented content will enhance the number of shares as well. The same

suggestions can be made for the control variables as for the number of reactions. Moderators or managers

should not post on a Saturday or a Sunday to increase the number of shares. Secondly, posting longer

messages will also enhance the number shares.

15.2.3 Enhancing the number of comments If we look at the relation between the topics and the number of comments, the same conclusion can be

made as for the number of reactions. Managers who want to increase the number of comments by making

use of one of the topics, should post about concerts & gigs. Furthermore, information and

transaction/entertainment have a positive effect on the number of comments as well as only transaction

related content. The effect of entertainment on the number of comments stays unclear, due the method

used in Exhibit 6. The fact that followers are likely to comment on transaction posts can be due to the

fact that they are tagging their friends to let them know the promotions of the bar. Furthermore, rand

resonance has a positive association with the number of comments. If we look at our dataset, most of

the brand resonance posts consisted out of wishing internal staff a happy birthday. So a possible

explanation could be that customers of the bar are also wishing the member of the staff a happy birthday

66

by commenting on the post. Additionally, no effect was found between functional and the number of

comments. Moderators can also post relationship/interaction-oriented and self-oriented messages to

increase the number of comments if they focus on the MOA (the same results were found for the number

of reactions). Lastly, as in line with the results found between weekend day and the number of reactions

and shares, marketeers should post during the week and longer messages to increase the number of

comments. Probably, longer messages are richer in form of provided information so people will feel

more attached to it.

67

16 Summary This work empirically tries to contribute to a better understanding of how Facebook messages can be

categorized and creates opportunities for managers to stimulate online engagement. More specifically,

we have looked at how (content) classification approaches can be used for the classification of Social

Media posts. Secondly, we have looked at which approach is best suitable for automatic content

classification. The performance was measured through Random Forest for the supervised approaches,

while a Topic modelling was used for an unsupervised classification approach. Finally, we have looked

at which categorization factors influence brand post popularity, measured by the number of reactions,

shares and comments. Our focus was on the bar/restaurant industry. The classification approaches

mentioned hereafter were found based on previous literature. For researchers the proposed classification

frameworks offer a starting point to categorize SM messages. It can also help managers to take a closer

look at which classification approach their strategy is best aligned with.

(0) Base Content Approach (information & entertainment/transaction)

(1) Content Approach (information, entertainment & transaction)

(2) Message Strategy Approach (functional, experiential, emotional & brand resonance)

(3) Marketeer’s Orientation Approach (task-oriented, relationship/interaction-oriented & self-

oriented)

(4) Viral Marketing Rules Approach (promotion, product, entertainment & event)

Which automatic content classification approach a manager should take into account mainly dependent

on the industry and the preferred viewpoint. The Message Strategy Approach is less suitable for the bar

industry compared to the (Base) Content Approach and the Marketeer’s Orientation Approach due to

the functional category. Results showed that marketeers who want to take the content viewpoint as a

starting point for message classification should make use of the Content Approach and not the Base

Content Approach, since the classification performance of the entertainment/transaction variable

increased by splitting them into two separate categories. Furthermore, we do not recommend to take the

Viral Marketing Rules Approach into account for managers who want to compare or take all the different

classification approaches into consideration, due to the overlap with the other more distinctive

approaches. Finally, marketeers who want to know the underlying themes of their brand page posts can

make use of topic modelling.

Moreover, this paper analysed the characteristics of the different classification approaches that might

influence online customer engagement measured by brand post popularity (number of likes, comments

& shares) Our results showed different relationships between the categorization variables of the different

approaches with the number of reactions, comments & shares. Transaction-oriented content (Content

Approach) or relationship/interaction-oriented content (Marketeer’s Orientation Approach) showed a

positive relationship with brand post popularity, probably due to the fact that these categorization types

68

are more encouraging people to interact online. Furthermore, information (Base Content Approach &

Content Approach) and performance (Topic Approach) showed a positive relation with the number of

reactions and comments, but a negative association with the number of shares. Posts who are more

company-oriented (self-oriented from the Marketer’s Orientation Approach and brand resonance from

the Message Strategy Approach) are positively related to the number of reactions and comments. Finally,

posts created during week days increase the level of brand post popularity, while longer posts

(containing more words in the message) will decrease the level of engagement. These findings should

encourage marketeers and managers to create a better insight on the (automatic-) content classification

of Facebook messages. Furthermore, these findings should also stimulate them to research the existence

of categorization types, originating from other classification approaches companies are currently using,

that influence online engagement with the customers.

69

17 Limitations and further research This research consists of some limitations which can be analysed in further research. First of all, our

data is coming from bar and restaurant companies. As mentioned before, some approaches and variable

categories are obviously more useful in manufacturing or transport industry in comparison to the bar

and restaurant industry. Further research could check more into detail and give insights on how each

approach applies for different industries.

Secondly, only one person classified 1000 labelled posts through human coding. As mentioned in the

human coding classification section, labelling posts into the specific category type is quite subjective.

Previous research has made use of workers of Amazon Mechanical Turk (AMT) to classify different

posts. AMT is a crowd sourcing marketplace for simple tasks. It enables the use of human intelligence

to perform tasks that are unable to be executed by computers (Lee et al., 2018; Stephen et al., 2015).

Since this is costly and we would depend on others, we did not make use of this technique. Still, to

improve the classification of our different approaches, further research could make use of this technique

or classify a bigger labelled data set conducted by more people to test the inter-consistency (Ashley &

Tuten, 2015)

Thirdly, our research focused on online engagement of brand pages by analysing the influence of the

different approaches on BPP. Still, little research has analysed the effect of what content drives

sales/profitability or consumer purchases. The study from Goh et al. (2013) analysed the impact of

content on customers repeated purchase behaviour while the study from Rishika et al. (2013) analysed

the impact on profitability. Further research could analyse the effect of our proposed approaches on the

sales of the bars and restaurants.

Furthermore, our model was restricted to the number of valuable IV’s. Only the classification variables

of the different approaches with two extra control variables were taken into our model. As mentioned

before, it is obvious that the behaviour of a person, i.e. if he would like, comment or share a BP, depends

on more factors (e.g. social contagion, mood of the person or even the weather outside). Additionally,

posts of our data containing empty messages were perceived as “empty” content, this is probably due to

the fact that these posts were sharing media related content (e.g. video’s, links, photos, images etc.). But

since no data was available on the media types used for the BP’s, we could not include the media type

element into our models. Secondly, we did not adjust our DV on the number of followers of the specific

BP’s. The more people who are following you, the higher the possibility to have more likes or comments

(Cvijikj & Michahelles, 2013). Emoticons used in the content message were removed out of the post.

Further analysis could take emoticons into account on analysing the effects of messages on online

engagement. We are convinced that a richer dataset, taking more outside factors into account, will

increase the performance of our models.

70

Finally, our research is limited to content coming from Facebook. Over the last year, other platforms

have gained importance for online engagement between companies and their customers. For the

moment, Instagram is “the place to be” when it comes to social engagement. Further research could

check how the different classification approaches behave on Instagram, in order to allow companies to

make efficient use of all their SM platforms.

Although our research has some limitations, we are convinced that this study is valuable to the literature

of content classification of SM messages. Our study can be used as a guideline for further research that

involves Social Media content classification.

VIII

References 2,7 miljoen Europeanen zijn getroffen door privacyschandaal Facebook. (2016, April 6). HLN.

Retrieved from https://www.hln.be/nieuws/buitenland/2-7-miljoen-europeanen-zijn-getroffen-door-privacyschandaal-facebook~a6dfde33/

A gentle introduction to topic modeling using R. (2015). Retrieved 15 February 2019, from Eight to Late website: https://eight2late.wordpress.com/2015/09/29/a-gentle-introduction-to-topic-modeling-using-r/

Arnold, T. W. (2010). Uninformative Parameters and Model Selection Using Akaike’s Information Criterion. Journal of Wildlife Management, 74(6), 1175–1178. Retrieved from http://www.bioone.org/doi/abs/10.2193/2009-367

Ashley, C., & Tuten, T. (2015). Creative Strategies in Social Media Marketing: An Exploratory Study of Branded Social Content and Consumer Engagement. Psychology and Marketing, 32(1), 15–27.

Berger, J., & Milkman, K. L. (2012). What Makes Online Content Viral? Journal of Marketing Research, XLIX, 192–205. Retrieved from www.marketingpower.com/jmr_

Berk, R. A. (2007). Random Forests. In Statistical Learning from a Regression Perspective (second edi, pp. 205–258). Retrieved from http://www.springer.com/series/417

Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32.

Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: Methods and Applications. New York: Cambridge University Press.

Chen, J. (2019). Business to Business (B2B). Retrieved from Investopedia website: https://www.investopedia.com/terms/b/btob.asp

Cvijikj, I. P., & Michahelles, F. (2013). Online engagement factors on Facebook brand pages. Social Network Analysis and Mining, 3(4), 843–861.

Davis, R., Piven, I., & Breazeale, M. (2014). Conceptualizing the brand in social media community: The five sources model. Journal of Retailing and Consumer Services, 21, 468–481.

De Pelsmacker, P., & Van Kenhove, P. (2007). Marktonderzoek: methoden en toepassingen (2nd ed.). Pearson Education Benelux.

de Vries, L., Gensler, S., & Leeflang, P. S. H. (2012). Popularity of Brand Posts on Brand Fan Pages: An Investigation of the Effects of Social Media Marketing. Journal of Interactive Marketing, 26, 83–91.

De Vries, N. J., & Carlson, J. (2014). Examining the drivers and brand performance implications of customer engagement with brands in the social media environment. Journal of Brand Management, 21(6), 495–515.

Definition of entertainment. (n.d.). Retrieved 15 December 2018, from Oxford dictionaries website: https://en.oxforddictionaries.com/definition/entertainment

Donges, N. (2018). The Random Forest Algorithm. Medium. Retrieved from https://towardsdatascience.com/the-random-forest-algorithm-d457d499ffcd

Feinerer, I. (2018). Introduction to the tm Package Text Mining in R. Retrieved from https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf

Gareth, J., Witten, D., Hastie, T., & Tibshirani, R. (2017). An Introduction to Statistical Learning with Applications in R. In A modern approach to regression with R. Retrieved from http://books.google.com/books?id=9tv0taI8l6YC

IX

Global social network penetration rate as of January 2019, by region. (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/269615/social-network-penetration-by-region/

Goh, K.-Y., Heng, C.-S., & Lin, Z. (2013). Social media brand community and consumer behavior: Quantifying the relative impact of user- and marketer-generated content. Information Systems Research, 24(1), 88–107.

Gummerus, J., Liljander, V., Weman, E., & Pihlström, M. (2012). Customer engagement in a Facebook brand community. Management Research Review, 35(9), 857–877.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Diagnostic Radiology, 143(1), 29–36. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/7063747

Hopkins, D. J., & King, G. (2010). A Method of Automated Nonparametric Content Analysis for Social Science. In American Journal of Political Science (Vol. 54).

How to Interpret a Regression Model with Low R-squared and Low P values. (2014). Retrieved 2 May 2019, from The Minitab Blog website: https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-interpret-a-regression-model-with-low-r-squared-and-low-p-values

Jahn, B., & Kunz, W. (2012). How to transform consumers into fans of your brand. Journal of Service Management, 23(3), 344–361.

Kim, D. H., Spiller, L., & Hettche, M. (2015). Analyzing media types and content orientations in Facebook for global brands. Journal of Research in Interactive Marketing, 9(1), 4–30.

Kiráľová, A., & Pavlíčeka, A. (2015). Development of Social Media Strategies in Tourism Destination. Procedia - Social and Behavioral Sciences, 175, 358–366.

Klema, V. C., & Laub, A. J. (1980). The Singular Value Decomposition: Its Computation and Some Applications. Transactions on Automatic Control, 25(2), 164–176.

Kremers, B. (n.d.). Electronic Word Of Mouth presents a window of opportunity for businesses. Retrieved 11 October 2018, from BuzzTALK website: https://www.buzztalkmonitor.com/blog/electronic-word-of-mouth-presents-a-window-of-opportunity-for-businesses/

Krug, S. (2016). Reactions Now Available Globally. Newsroom Facebook. Retrieved from https://newsroom.fb.com/news/2016/02/reactions-now-available-globally/

Larivière, B., & Van Den Poel, D. (2005). Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Systems with Applications, 29, 472–484.

Lee, D., Hosanagar, K., & Nair, H. S. (2018). Advertising Content and Consumer Engagement on Social Media: Evidence from Facebook. Management Science, 64(11), 5105–5131.

Liaw, A., & Wiener, M. (2002). Classification and Regression by RandomForest. ResearchGate, 2, 18–22. Retrieved from https://www.researchgate.net/publication/228451484

Liu, Y., & Shrum, L. J. (2002). What Is Interactivity and Is It Always Such a Good Thing? Implications of Definition, Person, and Situation for the Influence of Interactivity on Advertising Effectivness. Journal of Advertising, 31(4), 53–64.

Mangiatfico, S. (2019). Functions to Support Extension Education Program Evaluation. Retrieved from http://rcompanion.org

Market-Revenue Per Internet User. (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/outlook/220/100/social-media-advertising/worldwide#market-revenuePerInternetUser

X

Meire, M., Ballings, M., & Van den Poel, D. (2016). The added value of auxiliary data in sentiment analysis of Facebook posts. Decision Support Systems, 89, 98–112.

Most popular social networks worldwide as of April 2019, ranked by number of active users (in millions). (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

Murzintcev, N. (2019). Tuning of the Latent Dirichlet Allocation Models Parameters Description Estimates the best fitting number of topics. Retrieved from https://cran.r-project.org/web/packages/ldatuning/ldatuning.pdf

NEGATIVE BINOMIAL REGRESSION | R DATA ANALYSIS EXAMPLES. (n.d.). Retrieved 3 April 2019, from UCLA: Institute for Digital Research and Education website: https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/

NEGATIVE BINOMIAL REGRESSION | STATA DATA ANALYSIS EXAMPLES. (n.d.). Retrieved 3 April 2019, from UCLA Institute for Digital Research and Education website: https://stats.idre.ucla.edu/stata/dae/negative-binomial-regression/

Netzer, O., Feldman, R., Goldenberg, J., & Fresko, M. (2012). Mine Your Own Business: Market-Structure Surveillance Through Text Mining. In Marketing Science (Vol. 31).

Number of monthly active Facebook users worldwide as of 1st quarter 2019 (in millions). (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/

Number of social media users worldwide from 2010 to 2021 (in billions). (2019). Retrieved 25 January 2019, from Statista website: https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/

Porterfield, A. (2011, December). 4 Ways to Convert Facebook Fans Into Super Fans. Mashable. Retrieved from https://mashable.com/2011/12/12/facebook-fans-super-fans/?europe=true

Random Forests. (n.d.). Retrieved 21 March 2019, from Metagenomics. Statistics. website: https://dinsdalelab.sdsu.edu/metag.stats/code/randomforest.html

Rishika, R., Kumar, A., Janakiraman, R., & Bezawada, R. (2013). The Effect of Customers’ Social Media Participation on Customer Visit Frequency and Profitability: An Empirical Investigation. Information Systems Research, 24(1), 108–127.

Sabate, F., Berbegal-Mirabent, J., Cañabate, A., & Lebherz, P. R. (2014). Factors influencing popularity of branded content in Facebook fan pages. European Management Journal, 32, 1001–1011.

Sentiment Analysis. (n.d.). Retrieved 20 January 2019, from Technopedia website: https://www.techopedia.com/definition/29695/sentiment-analysis

Setty, S., Jadi, R., Shaikh, S., Mattikalli, C., & Mudenagudi, U. (2014). Classification of Facebook News Feeds and Sentiment Analysis. International Conference on Advances in Computing, Communications and Informatics (ICACCI), 18–23. Institute of Electrical and Electronics Engineers Inc.

Shen, B., & Bissell, K. (2013). Social Media, Social Me: A Content Analysis of Beauty Companies’ Use of Facebook in Marketing and Branding. Journal of Promotion Management, 19(5), 629–651.

Silge, J., & Robinson, D. (2017). Text Mining with R: A Tidy Approach. Retrieved from https://www.tidytextmining.com

Social Media Statistics & Facts. (2019). Retrieved 19 February 2019, from Statista website: https://www.statista.com/topics/1164/social-networks/

XI

Stephen, A. T., Sciandra, M. R., & Inman, J. J. (2015). Is it What You Say or How You Say It? How Content Characteristics Affect Consumer Engagement with Brands on Facebook.

Swani, K., Brown, B. P., & Milne, G. R. (2014). Should tweets differ for B2B and B2C? An analysis of Fortune 500 companies’ Twitter communications. Industrial Marketing Management, 43, 873–881.

Swani, K., Milne, G. R., Brown, B. P., Assaf, A. G., & Donthu, N. (2017). What messages to post? Evaluating the popularity of social media communications in business versus consumer markets. Industrial Marketing Management, 62, 77–87.

Tafesse, W. (2015). Content strategies and audience response on Facebook brand pages. Marketing Intelligence and Planning, 33(6), 927–943.

Tafesse, W., & Wien, A. (2017). A framework for categorizing social media posts. Cogent Business and Management, 4(1).

Tomaras, P., & Ntalianis, K. (2015). Evaluating the Impact of Posted Advertisements on Content Sharing Sites: An Unsupervised Social Computing Approach. Procedia - Social and Behavioral Sciences, 175, 219–226.

Your Easy Guide to Latent Dirichlet Allocation. (2018). Medium. Retrieved from https://medium.com/@lettier/how- does-lda-work-ill-explain-using-emoji-108abf40fa7d

Zhang, Y., Moe, W. W., & Schweidel, D. A. (2017). Modeling the role of message content and influencers in social media rebroadcasting. International Journal of Research in Marketing, 34, 100–119.

Zhao, W. X., Jiang, J., Weng, J., He, J., & Lim, E.-P. (2011). Comparing Twitter and Traditional Media Using Topic Models. The School of Information Systems at Institutional Knowledge at Singapore Management Univerisity.

XII

Appendix EXHIBIT 1 Literature review EXHIBIT 2 Random ID check of Facebook pages EXHIBIT 3 Distribution plot of dependent variables EXHIBIT 4 Descriptive overview EXHIBIT 5 Performance of information RF model EXHIBIT 6 Method for overall estimation results

1-1

EXHIBIT 1 Literature review

Content itself

Type of content

Unsupervised learning

Supervised learning

Engagement

Sales

Ash

ley

& T

uten

(201

5)x

xx

pred

ictiv

eIn

terB

rand

's B

est

Glo

bal B

rand

sFa

cebo

ok,

Mys

pace

, Tw

itter

, bl

ogs

&

foru

ms

IV: T

wee

ts, n

umbe

r of c

hann

els,

reso

nanc

e, a

nim

atio

n, u

ser

imag

e ap

peal

, exc

lusi

vity

app

eals

, fun

ctio

nal a

ppea

ls, e

xper

ient

ial

appe

als,

em

otio

nal a

ppea

ls, s

ocia

l cau

se &

ince

ntiv

e to

sha

re

cont

ent.

D

V: N

umbe

r of p

eopl

e fo

llow

ing,

Fac

eboo

k fa

ns, s

ocia

l in

fluen

ce, f

ollo

wer

s &

Eng

agem

ent s

core

.

Focu

s on

whi

ch s

ocia

l med

ia c

hann

els

& c

reat

ive

stra

tegi

es a

re u

sed

and

how

they

are

rela

ted

to c

onsu

mer

en

gage

men

t (sc

ore

Enga

gem

entd

B).

Fo

cus

on c

orre

latio

n w

ith e

ngag

emen

t sco

re.

Ber

ger

& M

ilkm

an (2

012)

xx

xpr

edic

tive

New

Yor

k Ti

mes

artic

les

Con

tent

: Ang

er, a

nxie

ty, s

adne

ss, a

we,

em

otio

nalit

y, p

ositv

ity.

DV

: Pos

ition

on

mos

t e-m

aile

d lis

t.St

udy

1: F

ield

stu

dy o

f em

otio

ns a

nd v

iralit

y of

NY

T.

Stud

y 2:

How

hig

h-ar

ousa

l em

tions

aff

ect t

rans

mis

sion

.

Stud

y 3:

How

dea

ctiv

atin

g em

otio

ns a

ffec

t tra

nsm

issi

on.

co

ntro

l var

iabl

es (p

ract

ical

util

ity, i

nter

estin

g &

su

rpris

ing)

.

Cvi

jikj &

Mic

hahe

lles

(201

3)x

xx

xpr

edic

tive

FMC

G

(foo

d/be

vera

ges)

.Fa

cebo

okC

onte

nt: E

nter

tain

men

t, in

form

atio

n, re

mun

erat

ion,

viv

idne

ss,

Inte

ract

vity

& p

ostin

g tim

e (w

orkd

ay &

pea

k ho

urs)

.

D

V: L

ikes

,com

men

ts, s

hare

s an

d in

tera

ctio

n du

ratio

n (e

ngag

emen

t).

Whi

ch c

onte

nt s

houl

d be

pos

ted

and

whe

n.

Dav

is e

t al.

(201

4)x

qual

itativ

e/

Face

book

Cat

egor

izat

ion:

Fun

ctio

nal b

rand

con

sum

ptio

n, e

mot

iona

l bra

nd

cons

umpt

ion,

sel

f-or

ient

ed b

rand

con

sum

ptio

n, s

ocia

l bra

nd

cons

umpt

ion

and

rela

tiona

l bra

nd c

onsu

mpt

ion.

Five

Sou

rces

Mod

el. Q

ualit

ive

rese

arch

thro

ugh

focu

s gr

oups

& o

fflin

e in

terv

iew

s.

De

Vri

es &

Car

lson

(201

4)x

qual

itativ

epr

oduc

t & s

ervi

ceFa

cebo

ok

(sur

vey

rela

ted)

Cat

egor

izat

ion:

Fun

ctio

nal v

alue

, hed

onic

val

ue, s

ocia

l val

ue &

co

-cre

atio

n va

lue.

Qua

litiv

e re

sear

ch th

roug

h qu

estio

nnai

re. F

ocus

on

cust

omer

eng

agem

ent.

de V

ries

et a

l. (2

012)

xx

xx

pred

ictiv

e6

prod

uct c

ateg

orie

s:

food

, acc

esso

iries

, le

isur

e w

ear,

alco

holic

be

vera

ges,

cos

met

ics

&

mob

ile p

hone

s.

Face

book

Con

tent

: Viv

idne

ss, i

nter

actv

ity, i

nfor

mat

iona

l con

tent

, en

terta

inm

ent c

onte

nt, p

ositi

on &

val

ence

of c

omm

ents

.

D

V: L

ikes

& c

omm

ents

(Bra

nd p

ost p

opul

arity

).

Whi

ch c

onte

nt s

houl

d be

pos

ted.

Goh

et a

l. (2

013)

xx

xpr

edic

tive

reta

iler

Face

book

Con

tent

: inf

orm

atio

n ric

hnes

s (in

form

ativ

e ef

fect

) & v

alen

ce

(per

suas

ive

effe

ct).

DV

: Tot

al p

urch

ase

expe

nditu

re.

Use

r-ge

nera

ted

cont

ent v

s. m

arke

ter-

gene

rate

d co

nten

t.

D

irec

ted

com

mun

icat

ion

vs. u

ndir

ecte

d co

mm

unic

atio

n.

Gum

mer

us e

t al.

(201

2)x

qual

atiti

vega

min

gFa

cebo

ok

(sur

vey

rela

ted)

Cat

egor

izat

ion:

soc

ial,

ente

rtain

men

t & e

cono

mic

.Q

ualit

ive

rese

arch

thro

ugh

ques

tionn

aire

. Foc

us o

n lo

yalty

.

Hop

kins

& K

ing

(201

0)x

xpr

edic

tive

pres

iden

tial e

lect

ion

Blo

gC

onte

nt: e

xtre

mel

y ne

gativ

e, n

egat

ive,

neu

tral,

posi

tive,

ext

rem

ely

posi

tive,

no

opin

ion

and

not a

blo

g.Fo

cus

on a

utom

ated

con

tent

anl

ysis

. Int

rodu

cing

a m

etho

d fo

r est

imat

ing

docu

men

t cat

egor

y pr

opor

tions

.

* IV

= In

depe

nden

t var

iabl

e(s)

, DV

= D

epen

dent

var

aibl

e(s)

Auth

ors

Media elements (photo's, links, video's,…)

Cate

goriz

atio

nEx

tra

info

rmat

ion

Data

so

urce

Rese

arch

m

etho

d (p

redi

ctiv

e/

desc

riptiv

e/

expl

orat

ory.

..)

Indu

stry

Mai

n Fo

cus

Lear

ning

ap

proa

ch

Comparison different categorization techniques

Depe

nden

t va

riabl

es

1-2

Content itself

Type of content


Supervised learning

Engagement

Sales

Jahn & K

unz (2012)x

qualitativedifferent industries

Facebook (survey related)

Categorization: content-oriented (functional value &

hedonic value), relationship-oriented (social interaction value &

brand interaction value) &

self-oriented (self-concept value).

Qualitive research through questionnaire. Focus on

satisfaction & loyalty.

Kim

et al. (2015)x

xx

xpredictive

Five product categories: convenience, shopping, specialty, industrial &

services. (B

est Global

Brands 2012)

FacebookIV

: Task-oriented (e.g. new product launch, advertising, online

coupons/discounts/contests), Interaction-oriented (e.g. picture, im

age, video, personal statement, special event, opinion, talks

about season/weather or entertainm

ent, asking for likes/com

ments/shares/answ

ers) & self-oriented (e.g. com

pany inform

ation, brand event, employee m

entions). D

V: likes,com

ments &

shares.

Five product categories: convenience, shopping, specialty, industrial &

services.

Lee et al. (2018)

xx

xx

predictiveInterbrand's best global 100 brands, 6 categories: celebrities and public figures, entertainm

ent, consum

er products and brands, organizations and com

pany, websites

& local places and

businesses

FacebookC

ontent: Directly inform

ative content & B

rand personality-related content. D

V: Likes, com

ments, shares and click-troughs.

Takes EdgeRank into account. Techniques used: A

mazon

Mechanical Turk &

natural language processing. Deep

analysis of the 2 categorization.

Meire et al. (2016)

xx

predictivesport (soccer)

FacebookIV

: featurs of focal post (lexicon, lexical, syntactic & tim

e) &

Auxiliary features (leading &

lagging).Focus on sentim

ent anlaysis.

Netzer et al. (2012)

xx

descriptiveC

ase 1: sedan cars. C

ase 2: diabetes drugs.Forum

sR

elationship between U

ser-generated content and Market-

structureR

ishika et al. (2013)x

xpredictive

retailer (whine)

FacebookIV

: Custom

ers' participation ("fan"). D

V: The intensity of the custom

er-firm relationship (visit

frequency) & profitability.

Interaction effects with custom

er characteristics: Purchase am

ount, Focus of buying, Deal sensitivity, Share of

premium

products, Loyalty, Age, G

ender, Income &

R

ace.

Sabate et al. (2014)x

xx

xpredictive

Spanish travel agenciesFacebook

IV: richness (im

ages,videos & links), tim

e frame (day of the

week, tim

e of publication). D

V: popularity (num

ber of likes & num

ber of comm

ents).

Control variables: length of the w

all post & num

ber of follow

ers.

Setty et al. (2014)x

xx

predictivedifferent industries

FacebookC

ategorization: liked pages posts, entertainment posts &

life event posts.

Focus on classification of Facebook posts and automatic

sentiment analysis.

Shen & B

issell (2013)x

xx

ExploratoryC

osmetic (beauty)

FacebookIV

: Event, product (e.g. product launch, product reviews and

tips), promotion (e.g. coupon, sam

ple and giveaways) &

entertainm

ent (e.g. beaut poll, Q&

A, survey, actviity w

ith reward

and applications, application services within the Facebook page).

DV

: likes & com

ments.

* IV = Independent variable(s), D

V = D

ependent varaible(s)

CategorizationAuthors


Main

FocusLearning approach

Dependent variables

Extra information


Research m

ethod (predictive/ descriptive/ exploratory...)

IndustryData source

1-3

Content itself

Type of content


Supervised learning

Engagement

Sales

Step

hen

et a

l. (2

015)

xx

xx

pred

ictiv

e4

inds

utrie

s: c

onsu

mer

-pa

ckag

ed g

oods

, re

stau

rant

s, re

tail

&

spor

ts

Face

book

Con

tent

: aro

usal

-orie

nted

, per

suas

ion-

orie

ntat

ed, i

nfor

mat

ion

&

calls

to a

ctio

n.

DV

: Atti

tudi

nal r

espo

nses

(lik

es &

neg

ativ

es) &

Mar

ketin

g ou

tcom

es (b

rand

exp

osur

e (r

each

), fe

edba

ck (c

omm

ents

), w

ord

of

mou

th (s

hare

s) &

web

site

traf

fic re

ferr

als

(clic

ks))

.

Focu

s on

wha

t (in

form

atio

n ch

ract

eris

tics)

is s

aid

and

how

(per

suas

ion

char

acte

rist

ics)

it is

sai

d.

Taki

ng a

udie

nce

mix

in c

osid

erat

ion

(cor

e vs

. cor

e +

non

core

fans

).

Swan

i et a

l. (2

014)

xx

pred

ictiv

eFo

rtune

500

Tw

itter

Cat

egor

izat

ion:

Cor

pora

te b

rand

nam

e, p

rodu

ct b

rand

nam

e,

func

tiona

l app

eals

, em

otio

nal a

ppea

ls, d

irect

cal

ls to

pur

chas

e,

info

rmat

ion

sear

ch &

has

tags

.

Focu

s on

diff

eren

ces

betw

een

B2B

and

B2C

.

Swan

i et a

l. (2

017)

xx

xpr

edic

tive

Fortu

ne 5

00

Face

book

IV: B

rand

cue

(cor

pora

te n

ame

& p

rodu

ct n

ame)

, mes

sage

app

eal

(fun

ctio

nal a

ppea

l & e

mot

iona

l app

eal),

sel

ling

stra

tegy

&

info

rmat

ion

sear

ch.

DV

: Lik

es &

com

men

ts.

Focu

s on

diff

eren

ces

betw

een

B2B

and

B2C

.

Taf

esse

(201

5)x

xx

xpr

edic

tive

auto

mob

iel

Face

book

IV: V

ivid

ness

, int

erac

tivty

, nov

elty

, bra

nd c

onsi

sten

cy &

con

tent

ty

pe (t

rans

actio

nal,

info

rmat

iona

l & e

nter

tain

men

t).

DV

: lik

es &

sha

res.

Con

trol v

aria

bles

(fan

num

bers

, pos

ting

date

& v

ehic

le

cate

gory

).

Taf

esse

& W

ien

(201

7)x

xx

**qu

alita

tive

cont

ent

anal

ysis

(d

educ

tive,

in

duct

ive

&

valid

atio

n co

ding

)

Inte

rBra

nd's

Bes

t G

loba

l Bra

nds

Face

book

Cat

egor

izat

ion:

Em

otio

nal f

unct

iona

l, ed

ucat

iona

l, br

and

reso

nanc

e, e

xper

ient

ial,

curr

ent e

vent

, per

sona

l, em

ploy

ee, b

rand

co

mm

unity

, cus

tom

er re

latio

nshi

p, c

ause

-rel

ated

& s

ales

pr

omot

ion.

Qua

litiv

e co

nten

t ana

lysi

s. A

lso

sum

mar

izat

ion

of

prev

ious

lite

ratu

re s

tudy

bas

ed o

n pr

opos

ed

cate

goriz

atio

ns.

Zha

ng e

t al.

(201

7)x

xde

scrip

tive

(fac

tor

anal

ysis

)bu

sine

ss s

choo

lsTw

itter

Con

tent

: 3 fa

ctor

s: S

cool

, fin

ance

& p

oliti

cs

D

V: r

ebro

adca

stin

g ac

tvity

Our

stu

dyx

xx

xx

xpr

edic

tive

Bar

s &

rest

aura

nts

Face

book

Mod

el 0

: Bas

e C

onte

nt A

ppro

ach

Mod

el 1

: Con

tent

App

roac

h

M

odel

2: M

essa

ge S

trate

gy A

ppro

ach

M

odel

3: M

arke

teer

's O

rient

atio

n A

ppro

ach

M

odel

4: V

iral M

arke

ting

Rul

es A

ppro

ach

U

nsup

ervi

sed

App

roac

h

Med

ia T

ype

App

roac

h

D

V: r

eact

ions

, com

men

ts &

sha

res

Com

paris

on o

f diff

eren

t mod

els

of c

ateg

oriz

atio

n (s

uper

vise

d &

uns

uper

vise

d).

* IV

= In

depe

nden

t var

iabl

e(s)

, DV

= D

epen

dent

var

aibl

e(s)

** li

tera

ture

com

paris

on, n

o im

pact

ana

lysi

s co

mpa

rison

Auth

ors


Mai

n Fo

cus

Lear

ning

ap

proa

chDe

pend

ent

varia

bles


Rese

arch

m

etho

d (p

redi

ctiv

e/

desc

riptiv

e/

expl

orat

ory.

..)

Indu

stry

Data

so

urce

Cate

goriz

atio

nEx

tra

info

rmat

ion

2-1

EXHIBIT 2 Random ID check of Facebook pages

Page_name (ID) Page name on Facebook Pagina tags

105075676201163 The Bank Bar102555626452916 Neighborly Bar Bar, sportbar641671832534219 Corner Street Pub Bar, restaurant, concert place152649688079204 The Stables Bar128652537199394 Doc Holliday's Saloon Tombstone, Arizona Bar973533759338577 Bottoms Up Bar1434430990167030 Ritz On The River Bar1682898595277000 Cardinal Cage Bar316853401797893 Bye the Willow Bar, winebar59540095667 The Malt House Bar, art & entertainment, whiskybar

3-1

EXHIBIT 3 Distribution plot of dependent variables

4-1

EXHIBIT 4 Descriptive overview

Figure 4-1: Most common used words

Figure 4-2: Wordcloud of most used words (no condition)

4-2

Figure 4-3: Wordcloud of most used words (3% condition)

Figure 4-4: Wordcloud of most used words (3% condition and less meaning words condition)

4-3

Table 4-1: Descriptive statistics 1: Facebookposts dataset

Table 4-2: Descriptive statistics 2: classification approaches variables

Variable Distinctive variables Minimum Maximum Mean SD Median

Feed_id 240210 / / / / /Feed_created_time 239435 2011-01-01 01:41:37

UTC2016-12-30 23:53:11 UTC

/ / /Feed_message 193845 / / / / /Reactions_count 602 0 2623 7,5982 32,42 2Comments_count 165 0 1506 0,8034 7,41 0Shares_count 230 0 13969 1,2377 30,82 0Page_name 476 / / / / /Extracted_on 2653 2017-06-07 13:23:00

UTC2017-06-07 14:59:01 UTC

/ / /Feed_message_length 2481 0 15000 145,78 263 91Feed_words_length 582 0 2704 24,91 42,75 16Weekend day 2 0 ( 73,69%) 1 (26,31%) / / /

Variable Occurence (#) Relative Frequency (%)

TM_1 95397 39,71%TM_2 17865 7,44%TM_3 86218 35,89%MD0_INF 43241 18,00%MD0_TRA.ENT 147435 61,38%MD1_INF 43120 17,95%MD1_ENT 131148 54,60%MD1_TRA 7388 3,08%MD2_FUN 21 0,01%MD2_EXP 158829 66,12%MD2_EMO 176 0,07%MD2_BRA 141 0,06%MD3_TAS 151278 62,98%MD3_REL 1238 0,52%MD3_SEL 1051 0,44%

5-1

EXHIBIT 5 Performance of information RF model

Figure 5-1: MDA for information, Base Content Approach

Figure 5-2:MDG for information, Base Content Approach

EXHIBIT 6 Method for overall estimation results To check the relationship between a specific classification variable and one of the 3 DV’s (number of

reactions, shares or comments), we checked the effect of a variable for 4 different models (Table 6-1).

First of all, the estimated result was checked when we included all the variables into one model (multi

approach model). Secondly, the effect of a classification variable was checked if a model was built with

only the variables from the related classification approach (isolated model). As mentioned before, we

applied for each variable a OLSR model and a NBR model.

Table 6-1: Models applied per categorization variable

For each variable, we compared to which extend the estimated results from the 4 models are in line with

each other. Table 6-2 gives an overview of the overall estimation result for a variable. If the results for

a specific variable were in line over the 4 different models, we classified the overall effect as green with

the corresponding effect. E.g. Emotional from the MSA was found significant and positively related to

the number of reactions for the MAM. The unstandardized coefficients were significant with a p-value

lower than 0.001 for the OLSR method and with a p-value lower than 0.05 for the NBR method.

("#$%&('('),+,-./-012 = 0.597, p < 0.001; "34&('('),+,-./-012 = 0.287, p < 0.05). The isolated MSA

confirmed this significant positive association with the number of reactions for the OLSR and the NBR

method. So in general, we can conclude that emotional is positive related to number of reactions. We

will take these green effects into our marginal implication section. Variables who had some insignificant

coefficients or variables who were removed after stepwise reduction in one of the 4 models and where

all the other variables were significant and related with the same direction to BPP, were classified as

orange. If one of the estimated effects is insignificant, we still assume the overall effect can be formed

since we are dealing with one direction (positive or negative related to BPP) of association. So these

estimated effects are also worth mentioning in the discussion and managerial implications. E.g. brand

resonance was removed after stepwise reduction for the MAM (OLSR) model and was found

insignificant related to the number of shares for the isolated approach (OLSR method)

("#$56(7'),8910:9+<-010=+ = -0.072, p > 0.05). If we look at the NBR method, brand resonance was

found insignificantly related to the number of shares for the MAM ("34&('('),8910:9+<-010=+ = -

0.048, p > 0.05) and marginally significant and negatively associated with the number of shares for the

IM ("34&(7'),8910:9+<-010=+ = -0.385, p < 0.05). Since 3 of the 4 estimated effects was insignificant,

we make the overall assumption that brand resonance is insignificant related to the number of shares.

Multi Approaches model 1 2

Isolated model 3 4

OLSR NBR

All other effects which do not belong to the green or orange category were classified as red. This means

that at least in two out of the four estimated results, contrary effects were found. E.g. Food was found

significant negatively related to the number of reactions for the OLSR method ("#$%&('('),>--: = -

0.111, p < 0.001;"#$%&(7'),>--: = -0.090, p < 0.001), while food was found positive and significantly

related to the number of reactions if we look at the NBR method ("34&('('),>--: = 0.229, p < 0.001;

"34&(7'),>--: = 0.262, p < 0.001). Since these effects are in contraction to each other, we can not make

an assumption on the overall effect and leave these results out of our managerial implication section.

SUMMARY:

- Green: All 4 coefficients for a specific variable are (marginally) significant with the same

related direction with BPP (number of reactions, shares or comments).

- Orange: Some coefficients from one out of the 4 models are insignificant, the other significant

variables have the same related direction with BPP (number of reactions, shares or comments).

- Red: At least one of the 4 variables is (marginally) significant positive related with BPP and at

least another variable is (marginally) significant negative related with BPP.

Table 6-2: Estimated overall effect of the variables over the different models

(Intercept) + + /Topic Approach Party / - / Food / - / Performance + - +Base Content Approach Information + - + Transaction/ Entertainment / / +Content Approach Information + - + Entertainment / + / Transaction + + +Message Strategy Approach Functional / + x Experiential / + / Emotional + x / Brand Resonance + x +Marketeer's Orientation Approach Task-oriented / / / Relationship/ Interaction-oriented + + + Self-oriented + / +Control variables Number of words + + + Weekend day - - -

CommentsReactions Shares

Automatic content-based evaluation of companies' facebook ... · AUTOMATIC CONTENT-BASED EVALUATION...

Documents

Transcript of Automatic content-based evaluation of companies' facebook ... · AUTOMATIC CONTENT-BASED EVALUATION...