Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0....

25
Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany 1 / 19

Transcript of Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0....

Page 1: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics

Business Analytics0. Overview

Lars Schmidt-Thieme

Information Systems and Machine Learning Lab (ISMLL)University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

1 / 19

Page 2: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics

Outline

1. What is Business Analytics?

2. Overview of the Lecture

3. Organizational Stuff

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

2 / 19

Page 3: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Outline

1. What is Business Analytics?

2. Overview of the Lecture

3. Organizational Stuff

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

2 / 19

Page 4: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Example 1: Demand Forecasting

Time

elec

tric

ity d

eman

d

2500

030

000

3500

040

000

4500

050

000

5500

0

Jan 01 Jan 02 Jan 04 Jan 06 Jan 08 Jan 10 Jan 12 Jan 14 Jan 16 Jan 18 Jan 20 Jan 22 Jan 24 Jan 26 Jan 28 Jan 30 Feb 01 Feb 03 Feb 05 Feb 07 Feb 09 Feb 11

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

2 / 19

Note: Metered half-hourly electricity demands, nationalgrid, UK, 2013.

Page 5: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Example 2: Recommender Systems

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

3 / 19

Page 6: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Example 3: Opinion Mining

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

4 / 19

Note: Example opinion taken from amazon.com.

Page 7: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Business Analytics and Machine Learning

O P T I M I Z A T I O N

InformationSystems

Robotics Bioinformatics

ManyFurtherApplications!

M A C H I N E L E A R N I N G

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

5 / 19

Page 8: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Business Analytics and Machine Learning

O P T I M I Z A T I O N

InformationSystems

Robotics Bioinformatics

ManyFurtherApplications!

M A C H I N E L E A R N I N G

B U S I N E S S

A N A L Y T I C S

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

5 / 19

Page 9: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 1. What is Business Analytics?

Business Analytics and Business IntelligenceBusiness Intelligence

I Business Intelligence and Business Analytics share the same goal:to manage, process, and analyze large-scale data for supportingoperations and management of businesses.

I Business Intelligence usually is focused onI data preprocessing (extract-transform-load, ETL),I data management, esp. data warehouses (DWH),I reporting, esp. Online Analytical Processing (OLAP),

I Business Analytics is focused onI analyzing the data (data mining, machine learning),I supporting a specific business process.

I Business Analytics also adopts Big Data Analytics that usesdifferent data management and processing infrastructure from datawarehouses.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

6 / 19

Page 10: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

Outline

1. What is Business Analytics?

2. Overview of the Lecture

3. Organizational Stuff

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

7 / 19

Page 11: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

Topics of the Lecture (Current Planning)

1. Basics of Prediction(supervised learning, regression and classification)

2. Cluster Analysis

3. Dimensionality Reduction and Visualization

4. Frequent Pattern Mining

5. Recommender Systems

6. Time Series Forecasting

7. Big Data Analytics

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

7 / 19

Page 12: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

1. Basics of Prediction

test set Dtest:y (price [$]) y

188,300 126,200102,700 126,200172,500 126,200127,700 126,200

97,800 126,200143,100 126,200116,500 126,200142,600 126,200157,100 126,200

......

MAE: 21,267.44RMSE: 28,052.88

● ● ●●

● ● ● ● ●●

● ● ● ● ●●

● ● ● ●● ●

● ●●

● ●● ●

● ●● ●

●●

0 10 20 30 40

8000

010

0000

1200

0014

0000

1600

0018

0000

2000

00

rank

pric

e

meanmedian

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

8 / 19

Page 13: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

2. Cluster Analysis

Ariz

ona

New

Mex

ico

Tex

asLo

uisi

ana

Mis

siss

ippi

Sou

th C

arol

ina

Geo

rgia

Ala

bam

aN

orth

Car

olin

aA

rkan

sas

Wes

t Virg

inia

Ken

tuck

yT

enne

ssee

Ala

ska

Nev

ada

Haw

aii

Mai

neO

klah

oma

Rho

de Is

land

Mis

sour

iP

enns

ylva

nia

Indi

ana

Ohi

oD

elaw

are

Mic

higa

nN

ew J

erse

yIll

inoi

sM

aryl

and

Virg

inia

Flo

rida

New

Yor

kC

onne

ctic

utN

orth

Dak

ota

Uta

hC

alifo

rnia

Mas

sach

uset

tsC

olor

ado

Was

hing

ton

Min

neso

taO

rego

nN

ebra

ska

Iow

aK

ansa

sW

yom

ing

Mon

tana

New

Ham

pshi

reId

aho

Ver

mon

tS

outh

Dak

ota

Wis

cons

in

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Cluster Dendrogram

hclust (*, "average")US states

Hei

ght

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

9 / 19

Page 14: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

3. Dimensionality Reduction and Visualization

−0.2 −0.1 0.0 0.1

−0.

2−

0.1

0.0

0.1

Comp.1

Com

p.2

1

2

3

45

6

7

8

9 1011

12

13

14

15

16

17

18

1920

21

22

2324

25

2627

28

29

30

31 32

33

34

3536

37

38

39

40

41

42

43

44

45

46

47

48

49

5051

52

53

54

55

56

5758

596061

6263

64

65

66

67

68

69

7071

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

9394

95

96

97

9899

100

101102

103

104

105

106 107

108

109110

111

112113

114

115

116

117

118

119

120

121122

123

124125

126

127

128

−10 −5 0 5

−10

−5

05

Price

SqFt

BedroomsBathrooms

Offers

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

10 / 19

Page 15: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

4. Frequent Pattern MiningAssociation rules in large transaction datasets:

I look for products frequently bought together (frequent itemsets).

I look for rules in buying behavior (association rules)

Examples:

I {beer, pampers, pizza} (support=0.5){bread,milk} (support=0.5)

I If beer and pampers, then pizza (confidence= 0.75)If bread, then milk (confidence=0.75)

cid beer bread icecream milk pampers pizza

1 + – – + + +2 + + – – + +3 + – + – + +4 – + – + – +5 – + + + – –6 + + – + + –

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

11 / 19

Page 16: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

5. Recommender Systems

item 1 2 3 4 5 6 7 8 9 . . .year ’95 ’95 ’95 ’95 ’95 ’95 ’95 ’95 ’95

action – + – + – – – – –children’s + – – – – – – + –

. . .

gend

erag

e

. . . item 1 2 3 4 5 6 7 8 9 . . .user 1 m 24 user 1 4 3

2 f 53 2 13 m 23 3 44 m 24 4 55 f 33 5 26 m 42 6 4...

...

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

12 / 19

Page 17: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

6. Time Series Forecasting25

000

3500

045

000

5500

0

obse

rved

3600

040

000

4400

0

tren

d

−10

000

050

00

seas

onal

−10

000

050

00

2013 2014 2015 2016 2017 2018 2019 2020

rand

om

Time

Decomposition of additive time series

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

13 / 19

Page 18: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 2. Overview of the Lecture

7. Big Data AnalyticsUser

Program

Master

(1) fork

worker

(1) fork

worker

(1) fork

(2)

assign

map

(2)

assign

reduce

split 0

split 1

split 2

split 3

split 4

output

file 0

(6) write

worker(3) read

worker

(4) local write

Map

phase

Intermediate files

(on local disks)

worker output

file 1

Input

files

(5) remote read

Reduce

phase

Output

files

Figure 1: Execution overview

Inverted Index: The map function parses each docu-

ment, and emits a sequence of 〈word,document ID〉pairs. The reduce function accepts all pairs for a given

word, sorts the corresponding document IDs and emits a

〈word, list(document ID)〉 pair. The set of all outputpairs forms a simple inverted index. It is easy to augment

this computation to keep track of word positions.

Distributed Sort: The map function extracts the key

from each record, and emits a 〈key,record〉 pair. Thereduce function emits all pairs unchanged. This compu-

tation depends on the partitioning facilities described in

Section 4.1 and the ordering properties described in Sec-

tion 4.2.

3 Implementation

Many different implementations of the MapReduce in-

terface are possible. The right choice depends on the

environment. For example, one implementation may be

suitable for a small shared-memory machine, another for

a large NUMA multi-processor, and yet another for an

even larger collection of networked machines.

This section describes an implementation targeted

to the computing environment in wide use at Google:

large clusters of commodity PCs connected together with

switched Ethernet [4]. In our environment:

(1)Machines are typically dual-processor x86 processors

running Linux, with 2-4 GB of memory per machine.

(2) Commodity networking hardware is used – typically

either 100 megabits/second or 1 gigabit/second at the

machine level, but averaging considerably less in over-

all bisection bandwidth.

(3) A cluster consists of hundreds or thousands of ma-

chines, and therefore machine failures are common.

(4) Storage is provided by inexpensive IDE disks at-

tached directly to individual machines. A distributed file

system [8] developed in-house is used to manage the data

stored on these disks. The file system uses replication to

provide availability and reliability on top of unreliable

hardware.

(5) Users submit jobs to a scheduling system. Each job

consists of a set of tasks, and is mapped by the scheduler

to a set of available machines within a cluster.

3.1 Execution Overview

The Map invocations are distributed across multiple

machines by automatically partitioning the input data

To appear in OSDI 2004 3

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

14 / 19

Note: from [DG04].

Page 19: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 3. Organizational Stuff

Outline

1. What is Business Analytics?

2. Overview of the Lecture

3. Organizational Stuff

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

15 / 19

Page 20: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 3. Organizational Stuff

Exercises and Tutorials

I There will be a weekly sheet with two exerciseshanded out each Tuesday in the lecture.1st sheet will be handed out next Tue. 30.4.

I Solutions to the exercises can besubmitted until next Monday noon1st sheet is due Mon. 6.5.

I Exercises will be corrected.

I Tutorials each Tuesday 12pm–2pm,1st tutorial at Tue. 30.4..

I Successful participation in the tutorial gives up to 10% bonus pointsfor the exam.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

15 / 19

Page 21: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 3. Organizational Stuff

Exam and Credit Points

I There will be a written exam at end of term(2h, 4 problems).

I The course gives 8 ECTS (3+2 SWS).

I The course can be used inI Wirtschaftsinformatik MSc / Wirtschaftsinformatik / Gebiet BII IMIT MSc. / Informatik / Gebiet KI & MLI as well as in both BSc programs.I It replaces the former separate courses “Business Intelligence” (BSc)

and “Business Analytics” (MSc).

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

16 / 19

Page 22: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 3. Organizational Stuff

Some Books

I Wolfgang Jank (2011):Business Analytics for Managers, Springer.

I James R. Evans (2013):Business Analytics, Methods, Models, and Decision, Pearson.

I Mike Biere (2011):The New Era of Enterprise Business Intelligence, Using Analytics toAchieve a Global Competitive Advantage, IBM Press.

I Daniel S. Putler, Robert E. Krider (2012):Customer and Business Analytics, Applied Data Mining for BusinessDecision Making Using R, CRC Press.

I A Ohri (2012):R for Business Analytics, Springer.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

17 / 19

Page 23: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 3. Organizational Stuff

More Books

I Trevor Hastie, Robert Tibshirani, Jerome Friedman (22009):The Elements of Statistical Learning, Springer.Also available online as PDF athttp://www-stat.stanford.edu/~tibs/ElemStatLearn/

I Kevin P. Murphy (2012):Machine Learning, A Probabilistic Approach, MIT Press.

I Christopher M. Bishop (2007):Pattern Recognition and Machine Learning, Springer.

I Richard O. Duda, Peter E. Hart, David G. Stork (22001):Pattern Classification, Springer.

I W. N. Venables, B. D. Ripley (2002):Modern Applied Statistics with S, Springer.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

18 / 19

Page 24: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics 3. Organizational Stuff

Some First Machine Learning Software

I R (v3.0.0, 3.4.2013; http://www.r-project.org).

I Weka (v3.6.9, 22.1.2013;http://www.cs.waikato.ac.nz/~ml/).

I SAS Enterprise Miner (commercially).

Public data sets:

I UCI Machine Learning Repository(http://www.ics.uci.edu/~mlearn/)

I UCI Knowledge Discovery in Databases Archive(http://kdd.ics.uci.edu/)

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

19 / 19

Page 25: Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) University

Business Analytics

References

Jeffrey Dean and Sanjay Ghemawat.

MapReduce: simplified data processing on large clusters.In OSDI’04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation,volume 6, 2004.

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

20 / 19