3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This...
-
Upload
christal-campbell -
Category
Documents
-
view
217 -
download
0
description
Transcript of 3/12/2013Computer Engg, IIT(BHU)1 OpenMP-3. OMP_INIT_LOCK OMP_INIT_NEST_LOCK Purpose: ● This...
3/12/2013 Computer Engg, IIT(BHU) 1
OpenMP-3
OMP_INIT_LOCKOMP_INIT_NEST_LOCK
Purpose:● This subroutine initializes a lock
associated with the lock variable.● The nest routine is new with
OpenMP version 3.0
OMP_DESTROY_LOCKOMP_DESTRIY_NEST_LOCK
Purpose:● This subroutine disassociates the
given lock variable from any locks.● The nest routine is new with
OpenMP version 3.0
OMP_SET_LOCKOMP_SET_NEST_LOCK
Purpose:● This subroutine forces the executing
thread to wait until the specified lock is available. A thread is granted ownership of a lock when it becomes available.
● The nest routine is new with OpenMP version 3.0
OMP_UNSET_LOCKOMP_UNSET_NEST_LOCK
Purpose:● This subroutine releases the lock
from the executing subroutine.● The nest routine is new with
OpenMP version 3.0
OMP_TEST_LOCKOMP_TEST_NEST_LOCK
Purpose:● This subroutine attempts to set a
lock, but does not block if the lock is unavailable.
● The nest routine is new with OpenMP version 3.0
OMP_GET_WTIME
Purpose:
● Provides a portable wall clock timing routine
● Returns a double-precision floating point value equal to the number of elapsed seconds since some point in the past. Usually used in "pairs" with the value of the first call subtracted from the value of the second call to obtain the elapsed time for a block of code.
● Designed to be "per thread" times, and therefore may not be globally consistent across all threads in a team - depends upon what a thread is doing compared to other threads.
OMP_GET_WTICK
Purpose:● Provides a portable wall clock
timing routine● Returns a double-precision floating
point value equal to the number of seconds between successive clock ticks.
Perfomance related Issues
Best Practices
Optimize Barrier Use
Avoid the Ordered Construct
Avoid Large Critical Regions
Maximize Parallel Regions
Address Poor Load Balance
Intel core i7 processor
Features
Model Name : Intel(R) Core(TM) i7 CPU [email protected]
Cache size : 8192 KB.
#of Cores=4, #of Threads =8
Max Turbo Frequency =3.8GHz
Max Memory Bandwidth : 21 GB/s
This quad-core processor features 8-way multitasking capability and additional L3 cache.
Intel® Hyper-Threading Technology (Intel® HT Technology): allows each core of your processor to work on two tasks at the same time.
AMD Phenon II
Frequency :3.2GHz
Total L2 Cache:3MB L3 Cache:6MB
The AMD Phenom™ II X6 1090T shifts frequency speed from 3.2GHz on six cores, to 3.6GHz on three cores.
Pi function on Intel i7 Processor
Model Name :Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz
cache size : 8192 KB Terms in Pi function:10 Crore
User time decreases as the no of thread increases upto 8 also can be seen that scalability falls rapidly after 4 threads as the Intel i7 processor is Quad Core Machine.
Pi function on AMD phenon II
User time decreases as the no of thread decreases upto 6 threads also can be seen that scalability falls rapidly after 6 threads
Just 4 statements would do it !!!!#pragma omp parallel shared(totalTerms,pi) private(mypi)
{
mypi = 0;
#pragma omp for
for (i=0; i<totalTerms; i++)
mypi += (4*(pow(-1,i)/double(2*i+1)));
#pragma omp critical (update_pi)
{
pi += mypi;
}
#pragma omp single
{
std::cout<<"omp_get_num_threads()="<<omp_get_num_threads()<<"\n";
} }
Pi function<while under for/> on i7 Processor
Just 6 statements would do it !!!!#pragma omp parallel shared(totalTerms,pi,k) private(mypi)
{ while(k<No_Iterations){
#pragma omp single
{
pi[k]=0;
}mypi = 0;
#pragma omp for
for (int i=0; i<totalTerms; i++)
mypi += (4.0*(pow(-1.0,i)/double(2.0*i+1.0)));
#pragma omp critical (update_pi)
{
pi[k] += mypi;
}
#pragma omp barrier
#pragma omp single
{ k++; } } }
Summary
OpenMP provides a compact, yet powerful programming model for shared memory programming
OpenMP preserves the sequential version of the program
Summary
Developing an OpenMP program:➢Start from a sequential program
➢Identify the code segment that takes most of the time.
➢Determine whether the important loops can be parallelized
•The loops may have critical sections, reduction variables, etc
➢Determine the shared and private variables.
➢Add directives.
➢See for example pi.c and piomp.c program.
●Challenges in developing correct openMP programs➢Dealing with loop carried dependence➢Removing unnecessary dependencies➢Managing shared and private variables
Thanks and References
Wikipedia : http://en.wikipedia.org/wiki/OpenMP
Msdn Magazine
www.openmp.org