Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0....
Transcript of Business Analytics - 0. Overview - Universität …...Business Analytics Business Analytics 0....
Business Analytics
Business Analytics0. Overview
Lars Schmidt-Thieme
Information Systems and Machine Learning Lab (ISMLL)University of Hildesheim, Germany
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
1 / 19
Business Analytics
Outline
1. What is Business Analytics?
2. Overview of the Lecture
3. Organizational Stuff
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
2 / 19
Business Analytics 1. What is Business Analytics?
Outline
1. What is Business Analytics?
2. Overview of the Lecture
3. Organizational Stuff
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
2 / 19
Business Analytics 1. What is Business Analytics?
Example 1: Demand Forecasting
Time
elec
tric
ity d
eman
d
2500
030
000
3500
040
000
4500
050
000
5500
0
Jan 01 Jan 02 Jan 04 Jan 06 Jan 08 Jan 10 Jan 12 Jan 14 Jan 16 Jan 18 Jan 20 Jan 22 Jan 24 Jan 26 Jan 28 Jan 30 Feb 01 Feb 03 Feb 05 Feb 07 Feb 09 Feb 11
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
2 / 19
Note: Metered half-hourly electricity demands, nationalgrid, UK, 2013.
Business Analytics 1. What is Business Analytics?
Example 2: Recommender Systems
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
3 / 19
Business Analytics 1. What is Business Analytics?
Example 3: Opinion Mining
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
4 / 19
Note: Example opinion taken from amazon.com.
Business Analytics 1. What is Business Analytics?
Business Analytics and Machine Learning
O P T I M I Z A T I O N
InformationSystems
Robotics Bioinformatics
ManyFurtherApplications!
M A C H I N E L E A R N I N G
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
5 / 19
Business Analytics 1. What is Business Analytics?
Business Analytics and Machine Learning
O P T I M I Z A T I O N
InformationSystems
Robotics Bioinformatics
ManyFurtherApplications!
M A C H I N E L E A R N I N G
B U S I N E S S
A N A L Y T I C S
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
5 / 19
Business Analytics 1. What is Business Analytics?
Business Analytics and Business IntelligenceBusiness Intelligence
I Business Intelligence and Business Analytics share the same goal:to manage, process, and analyze large-scale data for supportingoperations and management of businesses.
I Business Intelligence usually is focused onI data preprocessing (extract-transform-load, ETL),I data management, esp. data warehouses (DWH),I reporting, esp. Online Analytical Processing (OLAP),
I Business Analytics is focused onI analyzing the data (data mining, machine learning),I supporting a specific business process.
I Business Analytics also adopts Big Data Analytics that usesdifferent data management and processing infrastructure from datawarehouses.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
6 / 19
Business Analytics 2. Overview of the Lecture
Outline
1. What is Business Analytics?
2. Overview of the Lecture
3. Organizational Stuff
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
7 / 19
Business Analytics 2. Overview of the Lecture
Topics of the Lecture (Current Planning)
1. Basics of Prediction(supervised learning, regression and classification)
2. Cluster Analysis
3. Dimensionality Reduction and Visualization
4. Frequent Pattern Mining
5. Recommender Systems
6. Time Series Forecasting
7. Big Data Analytics
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
7 / 19
Business Analytics 2. Overview of the Lecture
1. Basics of Prediction
test set Dtest:y (price [$]) y
188,300 126,200102,700 126,200172,500 126,200127,700 126,200
97,800 126,200143,100 126,200116,500 126,200142,600 126,200157,100 126,200
......
MAE: 21,267.44RMSE: 28,052.88
●
●
● ● ●●
● ● ● ● ●●
● ● ● ● ●●
● ● ● ●● ●
● ●●
● ●● ●
● ●● ●
●
●
●
●●
●
●
●
0 10 20 30 40
8000
010
0000
1200
0014
0000
1600
0018
0000
2000
00
rank
pric
e
meanmedian
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
8 / 19
Business Analytics 2. Overview of the Lecture
2. Cluster Analysis
Ariz
ona
New
Mex
ico
Tex
asLo
uisi
ana
Mis
siss
ippi
Sou
th C
arol
ina
Geo
rgia
Ala
bam
aN
orth
Car
olin
aA
rkan
sas
Wes
t Virg
inia
Ken
tuck
yT
enne
ssee
Ala
ska
Nev
ada
Haw
aii
Mai
neO
klah
oma
Rho
de Is
land
Mis
sour
iP
enns
ylva
nia
Indi
ana
Ohi
oD
elaw
are
Mic
higa
nN
ew J
erse
yIll
inoi
sM
aryl
and
Virg
inia
Flo
rida
New
Yor
kC
onne
ctic
utN
orth
Dak
ota
Uta
hC
alifo
rnia
Mas
sach
uset
tsC
olor
ado
Was
hing
ton
Min
neso
taO
rego
nN
ebra
ska
Iow
aK
ansa
sW
yom
ing
Mon
tana
New
Ham
pshi
reId
aho
Ver
mon
tS
outh
Dak
ota
Wis
cons
in
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Cluster Dendrogram
hclust (*, "average")US states
Hei
ght
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
9 / 19
Business Analytics 2. Overview of the Lecture
3. Dimensionality Reduction and Visualization
−0.2 −0.1 0.0 0.1
−0.
2−
0.1
0.0
0.1
Comp.1
Com
p.2
1
2
3
45
6
7
8
9 1011
12
13
14
15
16
17
18
1920
21
22
2324
25
2627
28
29
30
31 32
33
34
3536
37
38
39
40
41
42
43
44
45
46
47
48
49
5051
52
53
54
55
56
5758
596061
6263
64
65
66
67
68
69
7071
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
9394
95
96
97
9899
100
101102
103
104
105
106 107
108
109110
111
112113
114
115
116
117
118
119
120
121122
123
124125
126
127
128
−10 −5 0 5
−10
−5
05
Price
SqFt
BedroomsBathrooms
Offers
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
10 / 19
Business Analytics 2. Overview of the Lecture
4. Frequent Pattern MiningAssociation rules in large transaction datasets:
I look for products frequently bought together (frequent itemsets).
I look for rules in buying behavior (association rules)
Examples:
I {beer, pampers, pizza} (support=0.5){bread,milk} (support=0.5)
I If beer and pampers, then pizza (confidence= 0.75)If bread, then milk (confidence=0.75)
cid beer bread icecream milk pampers pizza
1 + – – + + +2 + + – – + +3 + – + – + +4 – + – + – +5 – + + + – –6 + + – + + –
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
11 / 19
Business Analytics 2. Overview of the Lecture
5. Recommender Systems
item 1 2 3 4 5 6 7 8 9 . . .year ’95 ’95 ’95 ’95 ’95 ’95 ’95 ’95 ’95
action – + – + – – – – –children’s + – – – – – – + –
. . .
gend
erag
e
. . . item 1 2 3 4 5 6 7 8 9 . . .user 1 m 24 user 1 4 3
2 f 53 2 13 m 23 3 44 m 24 4 55 f 33 5 26 m 42 6 4...
...
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
12 / 19
Business Analytics 2. Overview of the Lecture
6. Time Series Forecasting25
000
3500
045
000
5500
0
obse
rved
3600
040
000
4400
0
tren
d
−10
000
050
00
seas
onal
−10
000
050
00
2013 2014 2015 2016 2017 2018 2019 2020
rand
om
Time
Decomposition of additive time series
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
13 / 19
Business Analytics 2. Overview of the Lecture
7. Big Data AnalyticsUser
Program
Master
(1) fork
worker
(1) fork
worker
(1) fork
(2)
assign
map
(2)
assign
reduce
split 0
split 1
split 2
split 3
split 4
output
file 0
(6) write
worker(3) read
worker
(4) local write
Map
phase
Intermediate files
(on local disks)
worker output
file 1
Input
files
(5) remote read
Reduce
phase
Output
files
Figure 1: Execution overview
Inverted Index: The map function parses each docu-
ment, and emits a sequence of 〈word,document ID〉pairs. The reduce function accepts all pairs for a given
word, sorts the corresponding document IDs and emits a
〈word, list(document ID)〉 pair. The set of all outputpairs forms a simple inverted index. It is easy to augment
this computation to keep track of word positions.
Distributed Sort: The map function extracts the key
from each record, and emits a 〈key,record〉 pair. Thereduce function emits all pairs unchanged. This compu-
tation depends on the partitioning facilities described in
Section 4.1 and the ordering properties described in Sec-
tion 4.2.
3 Implementation
Many different implementations of the MapReduce in-
terface are possible. The right choice depends on the
environment. For example, one implementation may be
suitable for a small shared-memory machine, another for
a large NUMA multi-processor, and yet another for an
even larger collection of networked machines.
This section describes an implementation targeted
to the computing environment in wide use at Google:
large clusters of commodity PCs connected together with
switched Ethernet [4]. In our environment:
(1)Machines are typically dual-processor x86 processors
running Linux, with 2-4 GB of memory per machine.
(2) Commodity networking hardware is used – typically
either 100 megabits/second or 1 gigabit/second at the
machine level, but averaging considerably less in over-
all bisection bandwidth.
(3) A cluster consists of hundreds or thousands of ma-
chines, and therefore machine failures are common.
(4) Storage is provided by inexpensive IDE disks at-
tached directly to individual machines. A distributed file
system [8] developed in-house is used to manage the data
stored on these disks. The file system uses replication to
provide availability and reliability on top of unreliable
hardware.
(5) Users submit jobs to a scheduling system. Each job
consists of a set of tasks, and is mapped by the scheduler
to a set of available machines within a cluster.
3.1 Execution Overview
The Map invocations are distributed across multiple
machines by automatically partitioning the input data
To appear in OSDI 2004 3
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
14 / 19
Note: from [DG04].
Business Analytics 3. Organizational Stuff
Outline
1. What is Business Analytics?
2. Overview of the Lecture
3. Organizational Stuff
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
15 / 19
Business Analytics 3. Organizational Stuff
Exercises and Tutorials
I There will be a weekly sheet with two exerciseshanded out each Tuesday in the lecture.1st sheet will be handed out next Tue. 30.4.
I Solutions to the exercises can besubmitted until next Monday noon1st sheet is due Mon. 6.5.
I Exercises will be corrected.
I Tutorials each Tuesday 12pm–2pm,1st tutorial at Tue. 30.4..
I Successful participation in the tutorial gives up to 10% bonus pointsfor the exam.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
15 / 19
Business Analytics 3. Organizational Stuff
Exam and Credit Points
I There will be a written exam at end of term(2h, 4 problems).
I The course gives 8 ECTS (3+2 SWS).
I The course can be used inI Wirtschaftsinformatik MSc / Wirtschaftsinformatik / Gebiet BII IMIT MSc. / Informatik / Gebiet KI & MLI as well as in both BSc programs.I It replaces the former separate courses “Business Intelligence” (BSc)
and “Business Analytics” (MSc).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
16 / 19
Business Analytics 3. Organizational Stuff
Some Books
I Wolfgang Jank (2011):Business Analytics for Managers, Springer.
I James R. Evans (2013):Business Analytics, Methods, Models, and Decision, Pearson.
I Mike Biere (2011):The New Era of Enterprise Business Intelligence, Using Analytics toAchieve a Global Competitive Advantage, IBM Press.
I Daniel S. Putler, Robert E. Krider (2012):Customer and Business Analytics, Applied Data Mining for BusinessDecision Making Using R, CRC Press.
I A Ohri (2012):R for Business Analytics, Springer.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
17 / 19
Business Analytics 3. Organizational Stuff
More Books
I Trevor Hastie, Robert Tibshirani, Jerome Friedman (22009):The Elements of Statistical Learning, Springer.Also available online as PDF athttp://www-stat.stanford.edu/~tibs/ElemStatLearn/
I Kevin P. Murphy (2012):Machine Learning, A Probabilistic Approach, MIT Press.
I Christopher M. Bishop (2007):Pattern Recognition and Machine Learning, Springer.
I Richard O. Duda, Peter E. Hart, David G. Stork (22001):Pattern Classification, Springer.
I W. N. Venables, B. D. Ripley (2002):Modern Applied Statistics with S, Springer.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
18 / 19
Business Analytics 3. Organizational Stuff
Some First Machine Learning Software
I R (v3.0.0, 3.4.2013; http://www.r-project.org).
I Weka (v3.6.9, 22.1.2013;http://www.cs.waikato.ac.nz/~ml/).
I SAS Enterprise Miner (commercially).
Public data sets:
I UCI Machine Learning Repository(http://www.ics.uci.edu/~mlearn/)
I UCI Knowledge Discovery in Databases Archive(http://kdd.ics.uci.edu/)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
19 / 19
Business Analytics
References
Jeffrey Dean and Sanjay Ghemawat.
MapReduce: simplified data processing on large clusters.In OSDI’04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation,volume 6, 2004.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
20 / 19