Portable Operating System Interface Thread Yukai Hung [email protected] Department of Mathematics...
-
Upload
clifton-hampton -
Category
Documents
-
view
216 -
download
0
Transcript of Portable Operating System Interface Thread Yukai Hung [email protected] Department of Mathematics...
![Page 1: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/1.jpg)
Portable Operating System Interface ThreadPortable Operating System Interface ThreadYukai Hung
[email protected] of MathematicsNational Taiwan University
Yukai [email protected]
Department of MathematicsNational Taiwan University
![Page 2: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/2.jpg)
POSIX Thread BasicPOSIX Thread Basic
![Page 3: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/3.jpg)
3
POSIX Thread BasicPOSIX Thread Basic
What is process? What is thread? - a thread of execution is the smallest unit of processing that can be scheduled by operating system, which is contained inside a process - multiple threads can exist within the same process and share resources, while different processes do not share the resources
How to create new process? - use system function fork(), which creates a copy of itself - parent and child process can tell each other apart by examining the return value of fork() system function (non-zero or zero value)
![Page 4: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/4.jpg)
4
POSIX Thread BasicPOSIX Thread Basic
int pthread_create(…) create new thread with specified thread attributes and execute thread function with specified function arguments http://opengroup.org/onlinepubs/007908799/xsh/pthread_create.html
void pthread_exit(…) terminate the current calling thread and makes the return value pointer available to any successful join with the terminating thread http://opengroup.org/onlinepubs/007908799/xsh/pthread_exit.html int pthread_join(…) suspend the execution of the current calling thread or process until the target thread terminates, unless the target thread has already terminated http://opengroup.org/onlinepubs/007908799/xsh/pthread_join.html
![Page 5: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/5.jpg)
5
POSIX Thread BasicPOSIX Thread Basic
#include <stdio.h>#include <stdlib.h>#include <pthread.h>
int main(int argc,char** argv){ int error1; int error2; int input1; int input2; int return1; int return2;
pthread_t thread1; pthread_t thread2;
input1=1; input2=2;
error1=pthread_create(&thread1,NULL,tfunction,(void*)&input1); error2=pthread_create(&thread2,NULL,tfunction,(void*)&input2);
![Page 6: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/6.jpg)
6
POSIX Thread BasicPOSIX Thread Basic
if(error1!=0||error2!=0) printf(“Error:thread create\n”);
error1=pthread_join(thread1,(void*)&return1); error2=pthread_join(thread2,(void*)&return2);
if(error1!=0||error2!=0) printf(“Error:thread join\n”); printf(“thread 1 return %d\n”,return1)); printf(“thread 2 return %d\n”,return2));
return 0; }
![Page 7: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/7.jpg)
7
POSIX Thread BasicPOSIX Thread Basic
void* tfunction(void* input){ printf(“thread %d is executing\n”,*((int*)input));
pthread_exit((void*)1); }
![Page 8: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/8.jpg)
8
POSIX Thread BasicPOSIX Thread Basic
int pthread_equal(…) compare two threads from two thread handles http://opengroup.org/onlinepubs/007908799/xsh/pthread_equal.html
pthread_t pthread_self(…) return the thread handle of the current calling thread http://opengroup.org/onlinepubs/007908775/xsh/pthread_self.html int pthread_cancel(…) request the thread be canceled, the target threads cancelability states and types determines when the cancellation takes effects http://opengroup.org/onlinepubs/007908775/xsh/pthread_cancel.html
![Page 9: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/9.jpg)
9
POSIX Thread BasicPOSIX Thread Basic
void pthread_cleanup_push(…) the function shall push the specified cancellation cleanup handler handler routine onto the calling threads cancellation cleanup stack http://linux.die.net/man/3/pthread_cleanup_push
void pthread_cleanup_pop(…) the function shall remove the routine at the top of calling cleanup thread cancellation stack and optionally invoke it (if input is non-zero) http://linux.die.net/man/3/pthread_cleanup_pop
![Page 10: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/10.jpg)
10
POSIX Thread BasicPOSIX Thread Basic
#include <stdio.h>#include <stdlib.h>#include <pthread.h>
int main(int argc,char** argv){ int rvalue; pthread_t thread;
if(pthread_create(&thread,NULL,tfunction,(void*)1)!=0) printf(“Error:thread create\n”); if(pthread_join(thread,(void*)&rvalue)!=0) printf(“Error:thread join\n”);
printf(“thread return %d\n”,rvalue));
return 0; }
![Page 11: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/11.jpg)
11
POSIX Thread BasicPOSIX Thread Basic
void* tfunction(void* input){ printf(“thread start\n”);
pthread_cleanup_push(cleanup,"thread first handler"); pthread_cleanup_push(cleanup,"thread second handler"); printf("thread push complete\n");
pthread_cleanup_pop(1); pthread_cleanup_pop(1);
return (void*)1; }
void cleanup(void* string){ printf(“cleanup:%s\n”,(char*)string);
return;}
![Page 12: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/12.jpg)
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 13: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/13.jpg)
13
Consider the following parallel program - threads are almost impossibly executed at the same time
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 14: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/14.jpg)
14
Scenario 1 - the result value R is 2 if the initial value R is 1
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 15: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/15.jpg)
15
Scenario 2 - the result value R is 2 if the initial value R is 1
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 16: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/16.jpg)
16
Scenario 3 - the result value R is 3 if the initial value R is 1
Race Condition and Mutex Lock Race Condition and Mutex Lock
![Page 17: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/17.jpg)
17
Solve the race condition by Locking - manage the shared resource between threads - avoid the deadlock or unbalanced problems
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 18: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/18.jpg)
18
Guarantee the executed instruction order is correct - the problem is back to the sequential procedure - lock and release procedure have high overhead
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 19: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/19.jpg)
19
Solve the race condition by Semaphore - multi-value locking method (binary locking extension) - instructions in procedure P and V are atomic operations
Race Condition and Mutex LockRace Condition and Mutex Lock
![Page 20: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/20.jpg)
20
Race Condition and Mutex LockRace Condition and Mutex Lock
#include <stdio.h>#include <stdlib.h>#include <pthread.h>
int main(int argc,char** argv){ int value; int error1; int error2;
pthread_t thread1; pthread_t thread2;
value=0;
error1=pthread_create(&thread1,NULL,tfunction,(void*)&value); error2=pthread_create(&thread2,NULL,tfunction,(void*)&value);
if(error1!=0||error2!=0) printf(“Error:thread create\n”);
![Page 21: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/21.jpg)
21
Race Condition and Mutex LockRace Condition and Mutex Lock
error1=pthread_join(thread1,NULL); error2=pthread_join(thread2,NULL);
if(error1!=0||error2!=0) printf(“Error:thread join\n”); printf(“final result is %d\n”,value)); return 0; }
void* tfunction(void* input){ *((int*)input)=*((int*)input)+1; return NULL;}
![Page 22: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/22.jpg)
22
Race Condition and Mutex LockRace Condition and Mutex Lock
int pthread_mutex_init(…) initialize the mutex referenced by mutex with specified attributes initialize an already initialized mutex results in undefined behavior http://opengroup.org/onlinepubs/007908775/xsh/pthread_mutex_init.html
int pthread_mutex_destroy(…) destroy the previously initialized mutex lock the mutex must not be used after it has been destroyed http://www.mkssoftware.com/docs/man3/pthread_mutex_destroy.3.asp
![Page 23: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/23.jpg)
23
Race Condition and Mutex LockRace Condition and Mutex Lock
int pthread_mutex_lock(…) lock the specified initialized mutex. if the mutex is already locked, the calling thread blocks until he mutex becomes available or unlock http://www.mkssoftware.com/docs/man3/pthread_mutex_lock.3.asp
int pthread_mutex_unlock(…) attempt to unlock the specified mutex. If there are threads blocked on the mutex object when unlock function is calling, resulting in the mutex becoming available the scheduling policy is used to determine which thread acquire the mutex http://www.mkssoftware.com/docs/man3/pthread_mutex_unlock.3.asp int pthread_mutex_trylock(…) try to lock the specified mutex. If the mutex is already locked, an error is returned, otherwise, the operation returns with the mutex in the locked state with the calling thread as its owner http://www.mkssoftware.com/docs/man3/pthread_mutex_trylock.3.asp
![Page 24: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/24.jpg)
24
Race Condition and Mutex LockRace Condition and Mutex Lock
#include <stdio.h>#include <stdlib.h>#include <pthread.h>
pthread_mutex_t work_mutex;
int main(int argc,char** argv){ int value; int error1; int error2;
pthread_t thread1; pthread_t thread2;
value=0;
if(pthread_mutex_init(&work_mutex,NULL)!=0) printf(“Error:work mutex create\n”);
![Page 25: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/25.jpg)
25
Race Condition and Mutex LockRace Condition and Mutex Lock
error1=pthread_create(&thread1,NULL,tfunction,(void*)&value); error2=pthread_create(&thread2,NULL,tfunction,(void*)&value);
if(error1!=0||error2!=0) printf(“Error:thread create\n”);
error1=pthread_join(thread1,NULL); error2=pthread_join(thread2,NULL);
if(error1!=0||error2!=0) printf(“Error:thread join\n”); printf(“final result is %d\n”,value);
if(pthread_mutex_destroy(&work_mutex)!=0) printf(“Error:work mutex destroy\n”);
return 0;}
![Page 26: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/26.jpg)
26
Race Condition and Mutex LockRace Condition and Mutex Lock
void* tfunction(void* input){ int* value;
if(pthread_mutex_lock(&work_mutex)!=0) printf(“Error:lock work mutex\n”); *((int*)input)=*((int*)input)+1;
if(pthread_mutex_unlock(&work_mutex)!=0) printf(“Error:work mutex unlock\n”);
return NULL;}
![Page 27: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/27.jpg)
Signal and Condition VariableSignal and Condition Variable
![Page 28: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/28.jpg)
28
Signal and Condition VariableSignal and Condition Variable
int pthread_cond_init(…) initialize the condition variable referenced by cond with specified attributes initialize an already initialized condition variable results in undefined behavior http://opengroup.org/onlinepubs/007908775/xsh/pthread_cond_init.html
int pthread_cond_destroy(…) destroy the previously initialized condition variable the condition variable must not be used after it has been destroyed
![Page 29: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/29.jpg)
29
Signal and Condition VariableSignal and Condition Variable
int loop=1;
pthread_cond_t cond;pthread_mutex_t mutex;
int main(int argc,char** argv){ pthread_t thread1; pthread_t thread2;
pthread_cond_init(&cond,NULL); pthread_mutex_init(&mutex,NULL);
pthread_create(&thread1,NULL,fthread1,(void *)NULL); pthread_create(&thread2,NULL,fthread2,(void *)NULL);
pthread_join(thread1,NULL); pthread_join(thread2,NULL);
pthread_cond_destroy(&cond); pthread_mutex_destroy(&mutex);
return 0;}
![Page 30: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/30.jpg)
30
Signal and Condition VariableSignal and Condition Variable
void* fthread1(void* input){ for(loop=1;loop<=9;loop++) { pthread_mutex_lock(&mutex);
if(loop%3==0) pthread_cond_signal(&cond); else printf("thread1:%d\n",loop);
pthread_mutex_unlock(&mutex);
sleep(1); }
return NULL;};
![Page 31: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/31.jpg)
31
Signal and Condition VariableSignal and Condition Variable
void* fthread2(void* input){ while(loop<9) { pthread_mutex_lock(&mutex);
if(loop%3!=0) pthread_cond_wait(&cond,&mutex); printf("thread2:%d\n",loop);
pthread_mutex_unlock(&mutex);
sleep(1); }
return NULL;};
![Page 32: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/32.jpg)
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
![Page 33: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/33.jpg)
33
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
A host thread can maintain one context at a time - need as many host threads as GPUs to maintain all device - multiple host threads can establish context with the same GPU hardware diver handles time-sharing and resource partitioning
host thread 0 host thread 1 host thread 2
host memory
device 0 device 1 device 2
![Page 34: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/34.jpg)
34
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
cudaGetDeviceCount() returns the number of devices on the current system with compute, capability greater or equal to 1.0, that are available for execution
cudaSetDevice() set the specific device on which the active host thread executes the device code. If the host thread has already initialized he cuda runtime by calling non-device management runtime functions, returns error
must be called prior to context creation, fails if the context has already been established, one can forces the context creation with cudaFree(0) cudaGetDevice(…) returns the device on which the active host thread executes the code
![Page 35: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/35.jpg)
35
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
#include <cuda.h>#include <stdio.h>#include <stdlib.h>#include <pthread.h>
#define MaxDevice 8
int main(int argc,char** argv){ int size; int loop; int devicecount;
float* h_veca; float* h_vecb; float* h_vecc;
pthread_t threadt[MaxDevice]; pthread_c threadc[MaxDevice];
size=32000*4; h_veca=(float*)malloc(sizeof(float)*size); h_vecb=(float*)malloc(sizeof(float)*size); h_vecc=(float*)malloc(sizeof(float)*size);
![Page 36: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/36.jpg)
36
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
for(loop=0;loop<size;loop++) { h_veca[loop]=1.0f; h_vecb[loop]=2.0f; h_vecc[loop]=0.0f; }
cudaGetDeviceCount(&devicecount); devicecount=(devicecount>MaxDevice)?MaxDevice:devicecount;
printf(“device number is %d\n”,devicecount);
for(loop=0;loop<devicecount;loop++) { threadc[loop].index=loop; threadc[loop].subsz=size/devicecount; threadc[loop].hveca=h_veca+loop*subsz; threadc[loop].hvecb=h_vecb+loop*subsz; threadc[loop].hvecc=h_vecc+loop*subsz; }
for(loop=0;loop<devicecount;loop++) pthread_create(threadt+loop,NULL,tfunction,(void*)(threadc+loop));
![Page 37: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/37.jpg)
37
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
for(loop=0;loop<devicecount;loop++) pthread_join(threadt[loop],NULL); for(loop=0;loop<size;loop++) if(h_vecc[loop]!=3.0f) printf(“Error:check result\n”);
free(h_veca); free(h_vecb); free(h_vecc);
return 0;};
struct pthread_c{ int index; int subsz;
float* hveca; float* hvecb; float* hvecc;};
![Page 38: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/38.jpg)
38
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
void* tfunction(void* content){ int index; int subsz; int gsize; int bsize;
float *hveca,*dveca; float *hvecb,*dvecb; float *hvecc,*dvecc;
index=(*((pthread_c*)content)).index; subsz=(*((pthread_c*)content)).subsz; hveca=(*((pthread_c*)content)).hveca; hvecb=(*((pthread_c*)content)).hvecb; hvecc=(*((pthread_c*)content)).hvecc;
printf(“thread %d start!\n”,index);
//for(int loop=0;loop<subsz;loop++) //hvecc[loop]=hveca[loop]+hvecb[loop]; cudaSetDevice(index);
![Page 39: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/39.jpg)
39
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
cudaMalloc((void**)&dveca,sizeof(float)*subsz); cudaMalloc((void**)&dvecb,sizeof(float)*subsz); cudaMalloc((void**)&dvecc,sizeof(float)*subsz); cudaMemcpy(dveca,hveca,sizeof(float)*subsz,cudaMemcpyHostToDevice); cudaMemcpy(dvecb,hvecb,sizeof(float)*subsz,cudaMemcpyHostToDevice);
bsize=256; gsize=(int)ceil((float)subsz/256); vecAdd<<<gsize,bsize>>>(dveca,dvecb,dvecc,subsz); cudaMemcpy(hvecc,dvecc,sizeof(float)*subsz,cudaMemcpyDeviceToHost);
cudaFree(dveca); cudaFree(dvecb); cudaFree(dvecc);
cudaError_t error; if((error=cudaGetLastError())!=cudaSuccess) printf(“cudaError:%s\n”,cudaGetErrorString(error));
printf(“thread %d finish!\n”,index);
return NULL;};
![Page 40: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/40.jpg)
40
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
__global__ void vecAdd(float* veca,float* vecb,float* vecc,int size) { int index; index=blockIdx.x*blockDim.x+threadIdx.x;
if(index<size) vecc[index]=veca[index]+vecb[index];
return;};
![Page 41: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/41.jpg)
41
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
Where is constant memory? - data is stored in the device global memory - read data through multiprocessor constant cache - 64KB constant memory and 8KB cache for each multiprocessor
How about the performance? - optimized when warp of threads read same location - 4 bytes per cycle through broadcasting to warp of threads - serialized when warp of threads read in different location - very slow when cache miss (read data from global memory) - access latency can range from one to hundreds clock cycles
![Page 42: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/42.jpg)
42
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
How to use constant memory? - declare constant memory on the file scope (global variable) - copy data to constant memory by host (because it is constant!!)
//declare constant memory __constant__ float cst_ptr[size];
//copy data from host to constant memorycudaMemcpyToSymbol(cst_ptr,host_ptr,data_size);
![Page 43: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/43.jpg)
43
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
//declare constant memory__constant__ float cangle[360];
int main(int argc,char** argv){ int size=3200; float* darray; float hangle[360]; //allocate device memory cudaMalloc((void**)&darray,sizeof(float)*size);
//initialize allocated memory cudaMemset(darray,0,sizeof(float)*size);
//initialize angle array on host for(int loop=0;loop<360;loop++) hangle[loop]=acos(-1.0f)*loop/180.0f;
//copy host angle data to constant memory cudaMemcpyToSymbol(cangle,hangle,sizeof(float)*360);
![Page 44: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/44.jpg)
44
Constant MemoryConstant Memory
//execute device kernel test_kernel<<<size/64,64>>>(darray);
//free device memory cudaFree(darray);
return 0;}
__global__ void test_kernel(float* darray){ int index;
//calculate each thread global index index=blockIdx.x*blockDim.x+threadIdx.x;
#pragma unroll 10 for(int loop=0;loop<360;loop++) darray[index]=darray[index]+cangle[loop]; return;};
![Page 45: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/45.jpg)
45
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
#include <cuda.h>#include <stdio.h>#include <stdlib.h>#include <pthread.h>
#define MaxDevice 8
__constant__ float cangle[360];
int main(int argc,char** argv){ int loop; int devicecount;
float summation; float hangle[360];
pthread_t threadt[MaxDevice]; pthread_c threadc[MaxDevice]; for(loop=0;loop<360;loop++)
hangle[loop]=acos(-1.0f)*loop/180.0f;
for(loop=0,summation=0.0f;loop<360;loop++) summation=summation+hangle[loop];
![Page 46: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/46.jpg)
46
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
cudaGetDeviceCount(&devicecount); devicecount=(devicecount>MaxDevice)?MaxDevice:devicecount;
for(loop=0;loop<devicecount;loop++) { threadc[loop].index=loop; threadc[loop].hangle=hangle; threadc[loop].summation=summation; } for(loop=0;loop<devicecount;loop++) pthread_create(threadt+loop,NULL,tfunction,(void*)(threadc+loop));
for(loop=0;loop<devicecount;loop++) pthread_join(threadt[loop],NULL);
return 0;}
struct pthread_c{ int index; float* hangle; float summation;};
![Page 47: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/47.jpg)
47
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
void* tfunction(void* content){ int size; int index; int gsize; int bsize;
float summation;
float* hangle; float* hvector; float* dvector;
size=32000;
index=(*((pthread_c*)content)).index; hangle=(*((pthread_c*)content)).hangle; summation=(*((pthread_c*)content)).summation;
printf(“thread %d start!\n”,index);
cudaSetDevice(index); cudaMemcpyToSymbol(cangle,hangle,sizeof(float)*360);
![Page 48: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/48.jpg)
48
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
hvector=(float*)malloc(sizeof(float)*size); cudaMalloc((void**)&dvector,sizeof(float)*size);
bsize=256; gsize=(int)ceil((float)size/256);
kernel<<<gsize,bsize>>>(dvector,size); cudaMemcpy(hvector,dvector,sizeof(float)*size,cudaMemcpyDeviceToHost);
for(loop=0;loop<size;loop++) if(hvector[loop]!=summation) printf("Error: check result\n");
free(hvector); cudaFree(dvector);
cudaError_t error; if((error=cudaGetLastError())!=cudaSuccess) printf(“cudaError:%s\n”,cudaGetErrorString(error));
printf(“thread %d finish!\n”,index);
return NULL;};
![Page 49: Portable Operating System Interface Thread Yukai Hung a0934147@gmail.com Department of Mathematics National Taiwan University Yukai Hung a0934147@gmail.com.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649eb65503460f94bbf9c7/html5/thumbnails/49.jpg)
49
Multiple Thread and Multiple GPUMultiple Thread and Multiple GPU
__global__ void kernel(float* dvector,int size){ int loop; int index;
float temp; index=blockIdx.x*blockDim.x+threadIdx.x;
if(index<size) { for(loop=0,temp=0.0f;loop<360;loop++) temp=temp+cangle[loop];
*(dvector+index)=temp; }
return;};