條件變量`condition_variable`的使用及陷阱

最近看代碼發現，在多線程中實現有關throttle和阻塞等有關的功能時，條件變量的使用是最常見的。

首先先對條件變量有個基本的認識

條件變量的基礎知識

條件變量std::condition_variable定義在頭文件<condition_variable>中。

條件變量用於阻塞一個或多個線程，直到某個線程修改線程間的共享變量，並通過condition_variable通知其余阻塞線程。從而使得已阻塞的線程可以繼續處理后續的操作。

從條件變量的作用可以知道，在使用條件變量時，分為兩個方面：

用於通知已阻塞線程，共享變量已改變
用於阻塞某一線程，直至該線程被喚醒

用於通知

可以分為兩步：

獲取互斥量std::mutex, 這個操作通常使用std::lock_guard來完成
在持有鎖的期間，在條件變量std::condition_variable上執行notify_one或者notify_all去喚醒阻塞線程。

這里列出相應的函數原型：

void notify_one() noexcept;
void notify_all() noexcept;

用於阻塞

可以分為三步：

使用std::unique_lock<std::mutex>來實現加鎖操作，使得可以在相同的互斥量mutex上（不同的線程）保護共享變量。
執行wait,wait_for 或 wait_until。該操作能夠原子性的釋放互斥量mutex上的鎖，並阻塞這個線程。
當條件變量condition_variable被通知，超時，或虛假喚醒時，該線程結束阻塞狀態，並自動的獲取到互斥量mutex上的鎖。當然，這里應該檢查是否為虛假喚醒。

這里列出相應的函數原型：

void wait (unique_lock<mutex>& lck);
template<class Pred> 
    void wait(unique_lock<mutex>& lock, Pred pred);
template<class Clock, class Duration> 
    cv_status wait_until(unique_lock<mutex>& lock, const chrono::time_point<Clock, Duration>& abs_time);
template<class Clock, class Duration, class Pred> 
    bool wait_until(unique_lock<mutex>& lock, const chrono::time_point<Clock, Duration>& abs_time, Pred pred);
template<class Rep, class Preiod>
    cv_status wait_for(unique_lock<mutex>& lock, const chrono::duration<Rep, Period>& rel_time);
template<class Rep, class Preiod, class Pred>
    bool wait_for(unique_lock<mutex>& lock, const chrono::duration<Rep, Period>& rel_time, Pred pred);

使用實例

這里先列出基本使用模板

#include <condition_variable>
#include <mutex>
#include <thread>

std::mutex lock;
std::condition_variable condVar;

bool dataReady{false};

void waitingForWork() {
    std::cout << "Waiting ..." << std::endl;
    std::unique_lock<std::mutex> l(lock);
    condVar.wait(l, []{return dataReady;});           // (4)
    std::cout << "Running ..." << std::endl;
}

void setDataReady() {
    {
        std::lock_guard<std::mutex> l{lock};
        dataReady = true;
    }
    std::cout << "Data prepared, notify one" << std::endl;
    condVar.notify_one();                             // (3)
}

int main() {
    std::cout << "==========Begin==========" << std::endl;

    std::thread t1(waitingForWork);                    // (1)
    std::thread t2(setDataReady);                      // (2)

    t1.join();
    t2.join();

    std::cout << "===========End===========" << std::endl;
}

這里同步工作是如何進行的呢？程序創建了兩個線程t1(1)和t2(2)，分別對應着waitingForWork和setDataReady。setDataReady進行通知，通過條件變量condVar來通知(3)它已經完成了前期的准備工作。而waitingForWork則在持有鎖的期間，等待通知(4)。

這里需要注意：收發方都需要同一把鎖，對於發送着來說，使用std::lock_guard已經足夠了，因為它只調用一次lock和unlock,而對於接收着，必須使用std::unique_lock,因為頻繁多次的lock和unlock。

輸出結果如下：

注：編譯時注意添加-pthread選項，避免出現相關thread的錯誤。

==========Begin==========
Waiting ...
Data prepared, notify one
Running ...
===========End===========

那么這里就有疑問了，wait函數明明可以不加前置條件pred也可以使用。為什么非要將工作流程寫的這個復雜呢？

這里有一個基本的規則：無條件的等待可能錯過喚醒，簡單的喚醒卻發現沒有事可干。這意味這什么？條件變量可能是兩個非常嚴重問題的受害者：喚醒丟失和虛假喚醒。

喚醒丟失和虛假喚醒

喚醒丟失： 喚醒丟失的現象是發送方在接收方進入等待狀態之前發送通知。結果就是導致通知消失。C++標准以同時同步機制描述條件變量，“條件變量類是原始的，可同步的用於阻塞單個或多個線程，...”, 因此，當通知丟失后，接受方將一直處於等待狀態。
虛假喚醒： 盡管沒有發生通知，但接受者也有可能會被喚醒。

下面詳細介紹下等待的工作流程：

等待工作流程

在等待的初始處理中，該線程鎖定互斥鎖，然后檢查謂詞[]{return dataReady;}（謂詞：在計算機語言的環境下，謂詞是指條件表達式的求值返回真或假的過程。）

如果謂詞被評估為：
- true: 線程繼續工作
- false: condVar.wait()解鎖互斥並將線程置於等待（阻塞）狀態
如果條件變量condVar處於等待狀態並收到通知或被虛假喚醒，則會發生下面步驟：
- 線程被解除阻塞，並重新獲得互斥鎖
- 線程檢查謂詞
- 如果謂詞被評估為：
  - true: 線程繼續其工作
  - false: condVar.wait()解鎖互斥並將線程置於等待（阻塞）狀態

看起來挺復雜！

那么看看沒有謂詞的情況

沒有謂詞

如果從上面的例子中移除謂詞，會發生什么呢？

//conditionVariablesWithoutPredicate.cpp
#include <condition_variable>
#include <mutex>
#include <thread>
#include <chrono>

std::mutex lock;
std::condition_variable condVar;

void waitingForWork() {
    std::this_thread::sleep_for(std::chrono::seconds(2));
    std::cout << "Waiting ..." << std::endl;
    std::unique_lock<std::mutex> l(lock);
    condVar.wait(l);                                     //(1)
    std::cout << "Running ..." << std::endl;
}

void setDataReady() {
    std::this_thread::sleep_for(std::chrono::seconds(1));
    std::cout << "Data prepared, notify one" << std::endl;
    condVar.notify_one();                                //(2)
}

int main() {
    std::cout << "==========Begin==========" << std::endl;

    std::thread t1(waitingForWork);
    std::thread t2(setDataReady);

    t1.join();
    t2.join();

    std::cout << "===========End===========" << std::endl;
}

現在，wait的調用沒有使用謂詞，這樣的同步看起來相當的簡單。但是遺憾的是，這中情況會導致喚醒丟失。下面的結果展示了喚醒丟失導致了死鎖。當然，這里為了100%必現喚醒丟失現象，我在兩者間加了不同的延遲。對於不信任第一個模板的，也可以添加延時進行測試。

運行結果是什么呢？

==========Begin==========
Data prepared, notify one
Waiting ...

好吧，教訓是艱難的，謂詞是肯定的。難道沒有別的簡單的方式？

`atomic`謂詞

可能你已經注意到了，變量dataReady僅僅只是一個布爾類型，那么使用atomic boolean，去掉發送者的鎖呢？

//conditionVariablesAtomic.cpp
#include <condition_variable>
#include <mutex>
#include <thread>
#include <atomic>

std::mutex lock;
std::condition_variable condVar;

std::atomic<bool> dataReady{false};

void waitingForWork() {
    std::cout << "Waiting ..." << std::endl;
    std::unique_lock<std::mutex> l(lock);
    condVar.wait(l, []{return dataReady.load();});
    std::cout << "Running ..." << std::endl;
}

void setDataReady() {
    dataReady = true;
    std::cout << "Data prepared, notify one" << std::endl;
    condVar.notify_one();
}

int main() {
    std::cout << "==========Begin==========" << std::endl;

    std::thread t1(waitingForWork);
    std::thread t2(setDataReady);

    t1.join();
    t2.join();

    std::cout << "===========End===========" << std::endl;
}

因為dataReady不用互斥量保護，相比第一個版本，相對來說比較簡單了。但是這存在一種競爭情況，可能造成死鎖。

wait表達式等價於下面四行：

std::unique_lock<std::mutex> l{lock}
while(![]{return dataReady.load();}) {
    //time window(1)
    condVar.wait(l);
}

即使將dataReady設為原子性，也應該在持有互斥鎖的情況下對它加鎖；如果不是，則可能會發生已通知對等待線程的更改，但是不能正確同步，這種競爭狀況可能會導致死鎖。

假設條件變量condVar在等待表達式中但不在等待狀態時發送通知。這意味着線程的執行位於注釋時間窗口(1)所在的源代碼片段，結果就是通知丟失，然后，線程返回等待狀態，大概率情況下可能會永久休眠。（這種情況會出現的一種可能，虛假喚醒發生，進入判斷條件，條件不滿足，在進入等待狀態前，通知發生，然后就導致通知丟失了）。

如果dataReady受互斥量保護，則不會發生這種情況。由於與互斥鎖同步，因此條件變量僅在接收方處於等待狀態時才發送通知。換句話說，在dataReady更改時，接受方只能處於等待狀態，更改完成后，發送通知，接收方就可以繼續執行了