OpenMP 入門 _生活百科

OpenMP 入門簡介OpenMP 一個非常易用的共享內存的并行編程框架，它提供了一些非常簡單易用的API ，讓編程人員從復雜的并發編程當中釋放出來，專注于具體功能的實現。openmp 主要是通過編譯指導語句以及他的動態運行時庫實現，在本篇文章當中我們主要介紹 openmp 一些入門的簡單指令的使用。
認識 openmp 的簡單易用性比如現在我們有一個任務，啟動四個線程打印 hello world ，我們看看下面 C 使用 pthread 的實現以及 C++ 使用標準庫的實現，并對比他們和openmp 的實現復雜性。
C 語言實現#include <stdio.h>#include <pthread.h>void* func(void* args) {printf("hello world from tid = %ld\n", pthread_self());return NULL;}int main() {pthread_t threads[4];for(int i = 0; i < 4; i++) {pthread_create(&threads[i], NULL, func, NULL);}for(int i = 0; i < 4; i++) {pthread_join(threads[i], NULL);}return 0;}上面文件編譯命令：gcc 文件名 -lpthread。
C++ 實現#include <thread>#include <iostream>void* func() {printf("hello world from %ld\n", std::this_thread::get_id());return 0;}int main() {std::thread threads[4];for(auto &t : threads) {t = std::thread(func);}for(auto &t : threads) {t.join();}return EXIT_SUCCESS;}上面文件編譯命令：g++ 文件名 lpthread。
OpenMP 實現#include <stdio.h>#include <omp.h>int main() {// #pragma 表示這是編譯指導語句表示編譯器需要對下面的并行域進行特殊處理 omp parallel 表示下面的代碼區域 {} 是一個并行域 num_threads(4) 表示一共有 4 個線程執行 {} 內的代碼因此實現的效果和上面的效果是一致的#pragma omp parallel num_threads(4){printf("hello world from tid = %d\n", omp_get_thread_num()); // omp_get_thread_num 表示得到線程的線程 id}return 0;}上面文件編譯命令：gcc 文件名 -fopenmp，如果你使用了 openmp 的編譯指導語句的話需要在編譯選項上加上 -fopenmp 。
從上面的代碼來看，確實 openmp 寫并發程序的復雜度確實比 pthread 和 C++ 低。openmp 相比起其他構建并行程序的方式來說，使用 openmp 你可以更加關注具體的業務實現，而不用太關心并發程序背后的啟動與結束的過程，OenpMP 會幫我們實現很多細節，讓程序的執行符合我們的直覺。
opnemp 基本原理在上文當中我們寫了一個非常簡單的 openmp 程序，使用 4 個不同的線程分別打印 hello world。我們仔細分析一下這個程序的執行流程：

文章插圖
在 openmp 的程序當中，你可以將程序用一個個的并行域分開，在并行域（parallel region）中，程序是有并發的，但是在并行域之外是沒有并發的，只有主線程（master）在執行，整個過程如下圖所示：

文章插圖
現在我們用一個程序去驗證上面的過程：

#include <stdio.h>#include <omp.h>#include <unistd.h>int main() {#pragma omp parallel num_threads(4){printf("parallel region 1 thread id = %d\n", omp_get_thread_num());sleep(1);}printf("after parallel region 1 thread id = %d\n", omp_get_thread_num());#pragma omp parallel num_threads(4){printf("parallel region 2 thread id = %d\n", omp_get_thread_num());sleep(1);}printf("after parallel region 2 thread id = %d\n", omp_get_thread_num());#pragma omp parallel num_threads(4){printf("parallel region 3 thread id = %d\n", omp_get_thread_num());sleep(1);}printf("after parallel region 3 thread id = %d\n", omp_get_thread_num());return 0;}

程序執行之后的一種輸出（還有很多其他的輸出形式，因為是多線程程序，線程的輸出是不確定的）如下所示：

parallel region 1 thread id = 0parallel region 1 thread id = 3parallel region 1 thread id = 1parallel region 1 thread id = 2after parallel region 1 thread id = 0parallel region 2 thread id = 0parallel region 2 thread id = 2parallel region 2 thread id = 3parallel region 2 thread id = 1after parallel region 2 thread id = 0parallel region 3 thread id = 0parallel region 3 thread id = 1parallel region 3 thread id = 3parallel region 3 thread id = 2after parallel region 3 thread id = 0

從上面的輸出我們可以了解到，id = 0 的線程就是主線程，在并行域內部程序的輸出是沒有順序的，但是在并行域的外部是有序的，在并行域的開始部分程序會進行并發操作，但是在并行域的最后會有一個隱藏的同步點，等待所有線程到達這個同步點之后程序才會繼續執行，現在再看上文當中