openMP多線程編程

本文轉載自查看原文 2017-05-16 18:53 2641 C++/VC++

OpenMP(Open Muti-Processing)

OpenMP缺點：

1：作為高層抽象，OpenMp並不適合需要復雜的線程間同步和互斥的場合；

2：另一個缺點是不能在非共享內存系統(如計算機集群)上使用。在這樣的系統上，MPI使用較多。

關於openMP實現 臨界區與互斥鎖 可參考 reference3

windows系統下使用

==========================WINDOWS系統中使用==========================

基本使用：

在visual C++2010中使用OpenMP

1：將 Project 的Properties中C/C++里Language的OpenMP Support開啟（參數為 /openmp）；

2：在編寫使用OpenMP 的程序時，則需要先include OpenMP的頭文件：omp.h；

3：在要並行化的for循環前面加上 #pragma omp parallel for

如下簡單例子：

[cpp] view plain copy

//未使用OpenMP
#include <stdio.h>
#include <stdlib.h>
void Test(int n) {
for(int i = 0; i < 10000; ++i)
{
//do nothing, just waste time
}
printf("%d, ", n);
}
int main(int argc,char* argv[])
{
for(int i = 0; i < 16; ++i)
Test(i);
system("pause");
}

結果為：

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15，

[cpp] view plain copy

//使用OpenMP
<pre name="code" class="cpp">#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void Test(int n) {
for(int i = 0; i < 10000; ++i) {
//do nothing, just waste time
}
printf("%d, ", n);
}
int main(int argc,char* argv[])
{
#pragma omp parallel for
for(int i = 0; i < 16; ++i)
Test(i);
system("pause");
}

（我的筆記本為2核 4線程）

顯示結果為：

0,12,4,8,1,13,5,9,2,14,6,10,3,15,7,11,

OpenMP將循環0-15拆分成0-3,4-7,8-11，12-15四個部分來執行。

當編譯器發現#pragma omp parallel for后，自動將下面的for循環分成N份，(N為電腦CPU線程數)，然后把每份指派給一個線程去執行，而且多線程之間為並行執行。

關於獲取CPU核數與線程ID

[cpp] view plain copy

#include <iostream>
#include <omp.h>
int main(){
int sum = 0;
int a[10] = {1,2,3,4,5,6,7,8,9,10};
int coreNum = omp_get_num_procs();//獲得處理器個數（其實獲取的是線程的數量，我的筆記本為2核4線程，測試時獲取的數字為4）</span>
int* sumArray = new int[coreNum];//對應處理器個數，先生成一個數組
for (int i=0;i<coreNum;i++)//將數組各元素初始化為0
sumArray[i] = 0;
#pragma omp parallel for
for (int i=0;i<10;i++)
{
int k = <span style="color:#3366FF;">omp_get_thread_num();//獲得每個線程的ID</span>
sumArray[k] = sumArray[k]+a[i];
}
for (int i = 0;i<coreNum;i++)
sum = sum + sumArray[i];
std::cout<<"sum: "<<sum<<std::endl;
return 0;
}

Ubuntu系統中使用

=================ubuntu系統中=====================================

Hands on FAQ:

*怎么在Linux上運行OpenMP程序？
> 只需要安裝支持OpenMP的編譯器即可，比如GCC 4.2以上版本（好像Fedora Core帶的部分4.1版本也支持），或者ICC（我用的version 9.1是支持的，其他沒試過）。

*怎么缺點編譯器是不是支持OpenMP？
> 看編譯器安裝路徑下/include目錄里有沒有omp.h。

*怎么區分OpenMP程序？
> 程序中有沒有以下內容：
> #include <omp.h>
> #pragma omp ...

*怎么編譯OpenMP程序？
> gcc -fopenmp [sourcefile] -o [destination file]
> icc -openmp [sourcefile] -o [destination file]

*怎么運行OpenMP程序？
> 編譯后得到的文件和普通可執行文件一樣可以直接執行。

*怎么設置線程數？
>：在程序中寫入set_num_threads(n);
> Method2：export OMP_NUM_THREADS=n;
> 兩種方法各有用處，前者只對該程序有效，后者不用重新編譯就可以修改線程數。

Example1:並行與串行時間差別

Sequetial Version:

[cpp] view plain copy

#include<iostream>
#include<sys/time.h>
#include<unistd.h>
using namespace std;
void test(int n)
{
int a=0;
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
for(int i=0;i<1000000000;i++)
{
a=i+1;
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<n<<" Time="<<timeUsed/1000<<" ms"<<endl;
}
int main()
{
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
int j=0;
for(j=0;j<4;j++)
{
test(j);
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<" Total Time="<<timeUsed/1000<<" ms"<<endl;
return 0;
}

Parallel Version:

[cpp] view plain copy

#include<iostream>
#include<sys/time.h>
#include<unistd.h>
#include<omp.h>
using namespace std;
void test(int n)
{
int a=0;
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
for(int i=0;i<1000000000;i++)
{
a=i+1;
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<n<<" Time="<<timeUsed/1000<<" ms"<<endl;
}
int main()
{
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
int j=0;
#pragma omp parallel for
for(j=0;j<4;j++)
{
test(j);
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<" Total Time="<<timeUsed/1000<<" ms"<<endl;
return 0;
}

Result:

Sequential version:

[cpp] view plain copy

0 Time=2064.69 ms
1 Time=2061.11 ms
2 Time=2076.32 ms
3 Time=2077.93 ms
Total Time=8280.14 ms

Parallel version:

[cpp] view plain copy

2 Time=2148.22 ms
3 Time=2151.72 ms
0 Time=2151.85 ms
1 Time=2151.77 ms
Total Time=2158.81 ms

------------------------------------------------------------------------------------------------------------------------------------------------------------

Example2:矩陣擬合法計算Pi

Sequential Version:

[cpp] view plain copy