A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul...
-
Upload
kristian-chandler -
Category
Documents
-
view
220 -
download
0
Transcript of A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul...
A Measurement Based Memory Performance Evaluation of Streaming
Media Servers
Garba Isa Yau and Abdul Waheed
Department of Computer Engineering
King Fahd University of Petroleum & Minerals
Dhahran Saudi Arabia
10th Annual IEEE Technical Exchange Meeting
Presented at the
March 23-24, 2003
Outline
• Introduction
• Motivation
• Experiments
• Results and Discussion
• Conclusions and Future Research
• Operating system Impact on performance
IntroductionBasic architecture
Disk MemoryNetworkinterface
Control
Streaming mediaserver
NetworkMedia client
• Unlike ordinary file downloads or Web applications, streaming media have:
stringent timing requirement high bandwidth requirement CPU intensive high memory requirement
Motivation
• CPU – Memory speed gap CPU speed doubles in about 18 months (Moore’s Law) Memory access time improves by only one-third in 10 years
• Hierarchical memory architecture introduced to alleviate CPU–memory speed gap
It works on locality of reference of data temporal locality spatial locality
• Streaming media content is a continuous data working set is normally large, cannot fit into cache it has very poor temporal locality (data reuse is poor)
• Hierarchical memory architecture becomes ineffective
ExperimentsTestbed
Metrics: cache misses (L1 & L2) page fault rate throughput server CPU utilization
Factors: number of streams media encoding rate (56kbps and 300kbps) stream distribution (unique or multiple)
A B C D E F G HSELECTED
ON-LINE
Dual boot server(Windows 2000/Linux Server)
Dual boot client machines(Windows 2000/Linux Server)
Experiments cont.
• Servers: Apple Darwin streaming server Microsoft Windows media server
• Clients: DSS- Streaming Load Simulator WMS - Media load simulator
• Tools: Intel Vtune performance analyzerWindows performance monitor netstat, vmstat, sar etc.
Results and Discussion
1
101
201
301
401
501
601
701
801
1 10 100 200 300 400 500 600 700 1000
number of streams (clients)
nu
mb
er o
f ca
che
mis
ses
(mil
lio
ns)
dss, unique
dss, multiple
wms, unique
wms, multiple
1
101
201
301
401
501
601
701
801
1 10 100 200 300 400 500 600 700 1000
number of streams (clients)
nu
mb
er o
f ca
che
mis
ses
(mil
lio
ns)
dss, unique
dss, multiple
wms, unique
wms, multiple
• L1 Cache Performance
L1 cache misses (56kpbs) L1 cache misses (300kbps)
• L1 cache misses are mostly influenced by number of streams• Worst-case performance when the number of streams is high, 300kbps encoding rate and multiple media contents are requested by clients
• L2 Cache Performance
Results and Discussion cont.
0
500
1000
1500
2000
2500
1 10 100 200 300 400 500 600 700 1000
number of streams (clients)
nu
mb
er o
f ca
che
mis
ses
(mil
lio
ns)
dss, unique
dss, multiple
wms, unique
wms, multiple
L2 cache misses (300kbps)
• Comparison
For both L1 and L2 caches, windows media server has bettercache performance compared to Darwin streaming server
• Memory Performance
Results and Discussion cont.
0
100
200
300
400
1 10 100 200 300 400 500 600 700 1000
number of streams (clients)
pag
e fa
ult
s /
sec
dss, unique
dss, multiple
wms, unique
wms, multiple
Page fault rate (300kbps)
• Requests for unique media object does not incur much page faults since object can easily be served from memory
• Requests for multiple objects leads to high page fault rate since a lot of data blocks will have to be fetched from the disk
• High page fault rate leads to client’s timeout due to long delay
Results and Discussion cont.
• Throughput and CPU utilization
1
10001
20001
30001
40001
50001
1 10 100 200 300 400 500 600 700 1000
number of streams (clients)
thro
ug
hp
ut
(kb
ps)
dss, unique
dss, multiple
wms, unique
wms, multiple
Throughput (300kbps)
0
10
20
30
40
50
60
70
80
90
1 10 100 200 300 400 500 600 700 1000
number of streams (clients)
cpu
uti
liza
tio
n (
%)
dss, unique
dss, multiple
wms, unique
wms, multiple
CPU utilization (300kbps)
• Windows media server has higher throughput compared to Darwin streaming server
• For unique streams, CPU utilization scales with number of streams throughout, while is not the case with multiple streams
Memory Transfer Test
• ECT (extended copy transfer)
Characterizing the memory performance to observe what might be the impact of OS on memory performance
0
1000
2000
3000
4000
5000
6000
block size (working set)
Mem
ory
ban
dw
idth
(M
byt
es/s
ec)
Linux
Windows
• Locality of reference: temporal locality – varying working set size (block size) spatial locality – varying access pattern (strides)
Conclusion
• Future research media object pre-fetching and stream batching are techniques we are exploring to improve memory performance of the servers
• Both media servers exhibit similar cache/memory behavior
• Worst cache/memory performance at 300kbps encoding rate and multiple stream distribution
• High cache misses and page faults lead to performance degradation as a result of significant wastage in CPU cycles
• For streaming media servers, apart from I/O bottleneck, memory subsystem is a potential bottleneck on performance.