The Sun’s Neighbors (objects within 10 parsecs as of January 1, 2010)
Instrumenting parsecs raytrace
-
Upload
mario-almeida -
Category
Technology
-
view
874 -
download
0
description
Transcript of Instrumenting parsecs raytrace
Instrumenting a benchmark applicationTools and Measurements TechniquesProject by Mário Almeida (EMDC)
Barcelona, 25 April 2012
Index (1/2)Tools and configuration● Parsec
○ Overview○ Benchmark programs
● Extrae● Paraver● Configuration
1
Index (2/2)Measurements● Raytrace
○ Overview○ Code○ Inputs○ Traces○ Load Balancing○ Cache misses and instructions○ Execution time○ Configuration comparisons○ Extrae overhead
Conclusions 2
Tools and configuration
ParsecOverview● Benchmark with the following characteristics:
○ Multithreaded○ Emerging workloads○ Diverse○ Not HPC-focused○ Research
3
ParsecBenchmark programs● blackscholes● bodytrack● canneal● dedup● facesim● ferret● fluidanimate● freqmine● raytrace● ... 4
Extrae● Instrumentation package to trace programs
and run with shared memory model and message passing programming.
5
Paraver● Detailed quantitative analysis of a program
performance.● Concurrent comparative analysis of several
traces.● Support for mixed message passing and
shared memory.● Building of derived metrics.
6
Configuration (1/4)Boada server:
● Dual CPU Six Core with Hyperthreading.● Kills applications after a few minutes.● 24 GB of RAM.
Boada server:
● Used cpulimit to limit the cpu usage up to four cores.
7
Configuration (2/4)Installed and/or configured:
● Parsec 2.1 with raytrace package only.● Extrae 2.2.1.● Paraver 4.3.0 (in my laptop).● CpuLimit● Minor configurations on .bashrc.● Multiple scripts to clean, build and run.
8
Configuration (3/4)
9
Configuration (4/4)
10
Measurements
RaytraceOverview● Physical simulation for visualization● Computer animation● Input is a complex object of many triangles.
11
RaytraceCodeFor every pixel in the image
calculate trajectory of ray striking pixelfind closest intersection point of ray with scene
geometrycalculate contribution of all lights at intersection pointrecursively trace specularly reflected ray
end for
12
RaytraceInputs● simsmall - 1 million polygons (480x270)● simmedium - 1 million poly (960x540)● simlarge - 1 million poly (1920x1080)● native - 10 million poly (1920x1080)
13
RaytraceTrace (1/2)Only 10% of the execution time is parallel!
14
Not created Running
Render time is proportional to the # of frames!
RaytraceTrace (2/2)
15
RenderInit and adding object Build Context
RaytraceLoad balancing (1/2)
16Not created
Barrier
Create Threads Task
Wait for all threads
Good load balancing between the slave threads.
RaytraceLoad balancing (2/2)
17
RaytraceCache and instructions
18
High number of cache misses Very low number of cache misses
There were no significative diferences of IPC between threads.
RaytraceExecution time (1/3)
These are average times from multiple executions of the parallel code only and without extrae overhead.There was a high average deviation of 0.3 seconds in the experiments.Bigger inputs were more accurate.
19
RaytraceExecution time (2/3)
There was a smaller average deviation of 0.03 seconds. With 64 threads it runs almost three times faster!
20
RaytraceExecution time (3/3)
There was a even smaller average deviation of 0.02 seconds. With 64 threads it runs almost three times faster!
21
RaytraceConfiguration comparison
22
In the case of the limited configuration, although perfomance doesn't seem to degrade, the execution time seems to stabilize for more than 8 threads.
RaytraceExtrae overhead
23
Conclusions
Conclusions (1/3)● The system seemed to perform worse for a
number of threads multiple of the total number of physical cores.
● The program has a good load balancing. ● Fine-granular parallelism.
24
Conclusions (2/3)● Although it wasn't possible to verify,
increasing the input should cause higher cache misses, because of the big working sets that won't fit on the memory.
● Memory bandwidth should be the main issue
for good speedups. ● Boada killed almost all the native input
executions. 25
Conclusions (3/3)● Paraver simplifies the process of analyzing
an application performance. ● Better knowledge of the systems
architecture would be needed in order further analyse the performance of the application.
26
Questions