An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung,...

44
An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung,...

Page 1: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

An Automatic Classification Approach to Business Stakeholder

Analysis on the Web

Wingyan Chung, Hsinchun Chen, Edna O. F. Reid

January 16, 2003

Page 2: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

2

Agenda

• Introduction• Literature Review• Research Questions• Research Approach and Testbed• Evaluation Methodology• Experimental Results and Discussion• Conclusions and Future Directions

Page 3: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Introduction

Page 4: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

4

Current Business Environment

• Networked business environment facilitates information sharing

• Collaborative commerce integrates business processes among partners through electronic sharing of information– Sales support, vendor management, planning

and scheduling, demand planning, etc.

• Knowledge sharing about stakeholder relationships through a company’s Web sites and pages– Textual content or annotated hyperlinks

Page 5: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

5

Problems

• Information overload on the Web– Hinders analysis of stakeholder relationships

• Knowledge hidden in interconnected Web resources– Posing challenges to identifying and

classifying various business stakeholders• e.g., A company’s manager may not know who

are using their company’s Web resources

– Problem of traditional stakeholder analysis– The emergence of electronic commerce

Page 6: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

6

An Automatic Classification Approach

• Need better approaches to uncovering such knowledge – Enhance understanding of business stakeholders– Enhance understanding of competitive

environments

• We propose an automatic classification approach to business stakeholder analysis– Human knowledge + machine-learned information

• We will review related areas in stakeholder analysis and Web page classification techniques

Page 7: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Literature Review

Page 8: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

8

Stakeholder Analysis

• Stakeholder theories evolve over time while the view of firm changes– Production view (19th century): Suppliers and

Customers– Managerial view (20th century): + Owners,

Employees– Stakeholder view (1960-80s) (Freeman, 1984):

+ Competitors, Governments, News Media, Environmentalists, …

– E-commerce view (1990s - now): + International partners, Online communities, Multinational employees, …

Page 9: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

9

Summary of stakeholder typesResearch Stakeholder Types

Reid, 2003 Partners/suppliers, customer, employee, investor, education institutions, media, portal, public, recruiter, reviewer, competitor, unknown

Elias & Cavana, 2000

Owners, community, unions, employees, government, consumer advocates, competitors, financial community, media, customers, SIG, suppliers

Agle et al., 1999

Shareholders, employees, customers, government, communities

Donaldson & Preston, 1995

Investors, government, suppliers, trade associations, employees, communities, customers, political groups

Clarkson, 1995

Employees, shareholders, customers, suppliers, public stakeholders• These types, ordered by their relevance to those appearing on the

Web, are important for practical understanding of stakeholders of firms

Page 10: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

10

Comparing Stakeholder Types* UsedResearch P E C S U M G R V O T F I NReid, 2003

Elias & Cavana, 2000

Agle et al., 1999

Donaldson & Preston, 1995

Clarkson, 1995

P = Partners/suppliers, E = Employees/Unions, C = Customers,S = Shareholders/investors, U = Education/research institutions,

M=Media/Portals,G = Public/government, R = Recruiters, V = Reviewers, O = Competitors,T = Trade associations, F = Financial institutions, I = Political groups,N = SIG/Communities(Note that a class “Unknown” is not included here)

*

Page 11: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

11

Comments on Stakeholder Research

• Strong explanatory power but are weak at practical classification of stakeholders

• Conclusions drawn from old data• Previous research rarely considers the many

opportunities offered by the Web for stakeholder analysis, e.g.,– Business intelligence, which is obtained from the

business environment, is likely to help in stakeholder activities

– Tools have been developed to exploit business intelligence but not yet applied to stakeholder analysis

Page 12: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

12

BI and Stakeholder Analysis

• Advanced BI tools often rely on Web mining techniques to discover patterns on the Web automatically (Etzioni 1996; Kosala & Blockeel 2000), e.g.,– PageRank (Brin & Page 1998), HITS (Kleinberg

1999), Web IF (Ingwersen 1998)– External links mirror social communication

phenomena (e.g., stakeholder relationships)

• Tools and approaches exploit Web content and link structure information– Ong et al 2001; Tan et al. 2002; Reiterer et al.

2000; Chung et al. 2003; Reid 2003; Byrne 2003

Page 13: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

13

Information on the Web

• Structural and textual content• But commercial BI tools lack

analysis capability (Fuld et al. 2002)• Need to automate stakeholder

classification, a primary step in stakeholder analysis– Automatic classification of Web pages

is a promising way to alleviate the problem

Page 14: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

14

Web Page Classification

• The process of assigning pages to predefined categories – Helps to discover companies’ stakeholders on

the Web and enables companies to understand the competitive environment better

• Major approaches include k-nearest neighbor, neural network, Support Vector Machines, and Naïve Bayesian network (Chen & Chau 2004)

• Previous work– Kwon and Lee 2003; Mladenic 1998; Furnkranz

1999; Lee et al. 2002; Glover et al. 2002

Page 15: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

15

Feature selection in Web Page Classification

• Features considered– Page textual content: full text, page title, headings – Link related textual content: anchor text, extended

anchor text, URL strings – Page structural information: #words, #page out-

links, inbound outlinks (i.e., links that point to its own company), outbound outlinks (i.e., links that point to external Web site)

• Methods for selection– Human judgment / Use of domain lexicon– Feature ratios and thresholding – Frequency counting / MI

Page 16: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Research Questions

Page 17: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

17

Research Gaps

• Stakeholder research provides rich theoretical background but rarely considers the tremendous opportunities offered by the Web for stakeholder analysis– Conclusions drawn from old data may not reflect

rapid development in e-commerce

• Existing BI tools lack stakeholder analysis capability

• Automatic Web page classification techniques are well developed but have not yet been applied to business stakeholder classification

Page 18: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

18

Research Questions

• How can we develop an automated approach to business stakeholder analysis on the Web?

• How can Web page textual content and structural information be used in such an approach?

• What are the effectiveness (measured by accuracy) and efficiency (measured by time requirement) of such an approach for business stakeholder classification on the Web?

Page 19: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Research Approach and Testbed

Page 20: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

20

Automatic Classification Approach

• Purpose: To automatically classify the stakeholders of businesses on the Web in order to facilitate stakeholder analysis

• Rationale– Business stakeholders should have identifiable clues that

can be used to distinguish their types– The Web content and structural information is important

for understanding the clues for stakeholder classification

• Two generic steps:– Creation of a domain lexicon that contains key textual

attributes for identifying stakeholders– Automatic classification of Web pages (stakeholders)

linking to selected companies based on textual and structural content of Web pages

Page 21: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

21

Building a Research Testbed

• Business stakeholders of the KM World top 100 KM companies (McKellar 2003)

• Used backlink search function of the Google search engine to search for Web pages having hyperlinks pointing to the companies’ Web sites

• For each host company, we considered only the first 100 results returned – Removed self links and extra links from same sites– After filtering, we obtained 3,713 results in total – Randomly selected the results of 9 companies as

training examples (414 283 pages stored in DB)

Page 22: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

22

Creation of a Domain Lexicon

• Manually read through all the Web pages of the nine companies’ business stakeholders to identify one-, two-, and three-word terms that were indicative of business stakeholder types

• Extracted a total of 329 terms (67 one-word terms, 84 two-word terms, and 178 three-word terms), e.g.,

Page 23: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

23

Automatic Stakeholder Classification

• Three steps:

Manual Tagging

Feature selection

Automatic classificatio

n

Page 24: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

24

Manual Tagging

• Manually classified each of the stakeholder pages of the nine selected companies into one of the 11 stakeholder types (based on our review on slides 9-10)

Manual taggin

g

Feature selectio

n

Automatic classificatio

n

Page 25: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

25

Feature Selection

• Structural content features: binary variables indicating whether certain lexicon terms are present in the structural content– A term could be a one-, two-, or three-word long– Considered occurrences in title, extended anchor

text, and full text

• Textual content features: frequencies of occurrences of the extracted features– The first set of features was selected based on

human knowledge, while the second was selected based on statistical aggregation, thereby combining both kinds of knowledge

Manual taggin

g

Feature selectio

n

Automatic classificatio

n

Page 26: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

26

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

<title>David Schatsky: Search and Discovery in the Post-Cold War Era</title> ...

<p>I just saw a demo by <a href = "http://www.clearforest.com"> ClearForest, </a> a company that provides tools for analyzing unstructured textual information. It's truly amazing, and truly the search tool for the post-Cold War era. ... </p> ...

</body>

</html>

An Example(a media type)

Link to the host company (ClearForest)

HTML hyperlink and extended anchor text

Page 27: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

27

Automatic Classification

• A feedforward/backpropagation neural network (Lippman 1987) and SVM (Joachims, 1998) were used due to their robustness in automatic classification– Train the algorithms using the stakeholder

pages of the 9 training companies and obtain a model or sets of weights for classification

– Test the algorithms on sets of stakeholder pages of 10 companies different from training examples

Manual taggin

g

Automatic classificatio

n

Feature selectio

n

Page 28: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Evaluation Methodology

Page 29: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

29

Experimental Design

• Consisted of algorithm comparison, feature comparison, and a user evaluation study– Compared the performance of neural network

(NN), SVM, baseline method (random classification), human judgment

– Compared structural content features, textual content features, and a combination of the two sets of features

– 36 Univ of Arizona business students performed manual stakeholder classification and provided comments on the approach

Page 30: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Performance Measures

• Effectiveness:– Overall accuracy– Within-class accuracy

• Efficiency: time used (in minutes)• User subjective ratings and

comments

Page 31: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

31

User Study

• Each subject was introduced to stakeholder analysis and was asked to use our system named “Business Stakeholder Analyzer (BSA)” to browse companies’ stakeholder lists

• We randomly selected three companies (Intelliseek, Siebel, and WebMethods) from testing companies to be the targets of analysis

Page 32: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

32

Hypotheses (1)

• H1: NN and SVM would achieve similar effectiveness when the same set of features was used – Both techniques were robust – Procedure: created 30 sets of

stakeholder pages by randomly selecting groups of 5 stakeholder pages of each of the 10 testing companies

Page 33: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

33

Hypotheses (2)

• H2: NN and SVM would perform better than the baseline method – Incorporated human knowledge and machine

learning capability into the classification

• H3: Human judgment in stakeholder classification would achieve effectiveness similar to that of machine learning, but that the former is less efficient– They could make use of the Web page’s textual

and structural content in classifying stakeholders – Humans might spend more time on it

Page 34: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

34

Hypotheses (3)

• H4 & H5 examined the use of different types of features in automatic stakeholder classification – H4: structural = textual– H5: combined > structural or textual

alone

Page 35: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Experimental Results and Discussion

Page 36: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

36

Algorithm Comparison

• H1 not confirmed• NN performed significantly differently

than SVM when the same set of features was used – NN performed significantly better than SVM

when structural content features were used – SVM performed significantly better than NN

when textual content features or a combination of both feature sets were used

– More studies would be needed to identify optimal feature sets for each algorithm

Page 37: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

37

Effectiveness of the Approach

• H2 confirmed• The use of any combination of features

and techniques in automatic stakeholder classification outperformed the baseline method significantly – Our approach has integrated human

knowledge with machine-learned information related to stakeholder types …

– and was significantly better than a random conjecture

Page 38: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

38

Comparing with Human Judgment

• H3b and H3d (efficiency) confirmed– Human: 22 minutes (average), varied– Algorithms: 1 – 30 seconds (average)– Showing high efficiency of using the automatic

approach to facilitate stakeholder analysis

• H3a and H3c (effectiveness) not confirmed– Humans were significantly more effective than NN

or SVM – They could rely on more clues in performing

classification – Experience in Internet browsing and searching

helped narrow down choices

Page 39: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

39

However, the algorithms achieved better within-class accuracies than humans in frequently occurring types …

Page 40: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

40

Use of Features

• To our surprise, hypotheses H4a-b, H5a-b, and H5d were not confirmed – Different feature sets yielded different performances

of the algorithms • Structural features enabled NN to achieve better

effectiveness than textual ones• Textual and combined features enabled SVM to achieve

better effectiveness than structural ones

– Do not know exactly why– Future research: studying the effect of features and

the nature of algorithms

• H5c was confirmed: structural content feature did not add value to the performance of SVM

Page 41: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Subjects’ Comments

• Overwhelmingly positive

• “It would be very helpful!”• “That’s cool!” • “I want to use it.”

Page 42: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

Conclusions and Future Directions

Page 43: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

43

Conclusions

• Proposed an automatic classification approach to business stakeholder analysis on the Web – Integrated Human expert knowledge + machine-

learned information– Promising in terms of effectiveness and efficiency

• A strong potential to use the approach to augment traditional stakeholder classification

• Could potentially facilitate business analysts’ interaction with automated stakeholder analysis systems in today’s networked enterprises

Page 44: An Automatic Classification Approach to Business Stakeholder Analysis on the Web Wingyan Chung, Hsinchun Chen, Edna O. F. Reid January 16, 2003.

44

Future Directions

• To automate the next steps of business stakeholder analysis – With more expert participation and more Web

page data

• Type-specific stakeholder analysis – e.g., partner relationships are often important

in developing business strategies

• Automating cross-regional business stakeholder analysis – Study multinational business partnerships and

cooperation and related HCI issues