C++程設實驗項目三：黑白棋與基於UCT算法的AI

本文轉載自查看原文 2018-06-20 14:41 2038 學在NJU/ C

在這篇博客里，我將總結一下在這次實驗中學到的UCT算法實現原理。

首先是參考文章：

https://blog.csdn.net/u014397729/article/details/27366363 這是一篇用UCT算法實現四子棋AI的博客。這里給出了UCT的完整偽代碼，而且有現成的可運行代碼以供參考

https://blog.csdn.net/yw2978777543/article/details/70799799 這篇文章則用數學語言和偽代碼進一步闡述了UCT算法的工作原理

https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ 這篇英文文章則有一個清晰的圖示，可以直觀地認識UCT算法。

然后，讓我嘮叨一下本次黑白棋的具體實現和規則。

首先，每回合只有五秒的可用時間。這是為了防止有同學拿剪枝算法算太久。

同時，為了防止同學們寫剪枝算法的時候直接抄網上的估值表，定義了如下規則：本方下棋時不可以下在上一回合落子的鄰接點。

如圖，黑色星是剛剛下的位置，X是不能下的位置，白方可下位置用♂標出（不要在意細節）。

那么具體的數據結構是這樣設計的：

class Board {
    //記錄了棋盤，以及上一子的顏色和位置
    int chessboard[8][8], latestColor;
    chesspos latestStep;

public:
    //棋盤的操控
    Board(); //初始化，在四角放上棋子
    bool isEnd();  //用對黑白子都算可下位置的方法計算是否終局
    bool notConj(chesspos a, chesspos b); //用於判斷某點是否可下
    bool search(chesspos p, int color, int d); //判斷某個方向上是否滿足翻子條件
    void rev(chesspos p, int color, int d); //在search滿足后翻棋子
    void oneMove(chesspos p); //確保合法的狀態下給定位置，自動完成落子過程
    vector<chesspos> getValidPos(int, chesspos); //提供顏色和上一子的位置，返回可下位置

    //將會用於UCT算法的操作
    int calScore(); //計算分數
    chesspos randomMove(); //隨機落子
    int simulate(); //用隨機落子的方法完成棋盤

    //作業具體要求下的操作，可以無視之
    void graphBoard();
    void graphBoard(string path);
    void printScore(string path);
};

好了，讓我們開始討論UCT算法吧。

要寫一個基於UCT算法的AI，首先就要弄懂UCT算法究竟在干什么。這是很重要，只有弄懂了怎么回事，才可以基於偽代碼的框架實現，否則很可能實現的東西四不像，甚至為別人下棋。

黑猩猩算法——僅僅比隨機落子好一點

在接觸UCT之初，我曾一知半解地構想出這樣一個解決方案：

我們在計算正方形內切圓面積時，可以隨機播撒豆子。然后，數一數圓內的豆子數量，與所有豆子的數量比較，就可以知道圓的面積了。
基於同樣的道理，我可以隨機在所有的可下位置選一點，然后通過隨機落子完成棋盤。只要模擬次數夠多，那么不同位置的可能勝率就會有差異。選擇勝率最好的那個位置，然后就可以一路走向成功嘍！

盡管可能是我的估值設計有誤（在沒有理解具體含義的情況下使用UCT算法的估值公式），導致算法和隨機落子沒太大區別，但總而言之，黑猩猩不是一個好的算法。模擬是需要時間的，五秒鍾隨機的結果很難讓優勢顯現出來，就好像一個一米的正方形，只播撒一百顆豆子去算內切圓的面積，當然算不准。那么如何去改進呢？

多臂賭博機問題，以及由此引出的UCB算法

現在不考慮黑白棋了，考慮我們去玩賭博機。現在面前有幾台（不妨設為4台）相同的賭博機，它們的玩法是一樣的：拉下拉桿，然后有幾率獲得一枚硬幣作為獎勵。不同的賭博機有不同的出獎概率，而你拉動拉桿的次數是有限的——比如233次。如何設計策略，讓你這233次的拉動能獲得最高的獎勵呢？

首先，憑我們的直覺，當然是每個賭博機都拉一次，先看看他們的表現如何。
然后，如果A,B,C,D四個賭博機中，只有B賭博機給了你硬幣，那么你要怎么選擇呢？從當前的局面來看，當然是要繼續拉B賭博機了——畢竟，從統計學上說，B賭博機的出獎概率是100%呢。
然而，又兩次的拉動，都沒有出獎。即使B賭博機的出獎概率還是比A,C,D的高——33%對0%，但你有理由懷疑，A,C,D中有更好的選擇，只是樣本太少暫時沒有顯現出來。於是你就先放下了B，轉而嘗試其它賭博機。

這聽起來的確很有道理，而也的確就是UCB算法處理博弈樹的思想：

通過多次模擬的結果，尋找到概率最高的那一個節點。將自己的主要精力用在這一個節點上，避免不必要的浪費。這個過程叫利用（Exploitation）。
但是，也要照顧到那些被“冷落”的節點，避免失去機會。這個過程叫探索（Exploration）。

那么，我們如何確定選擇哪一個節點呢？這就要使用UCB公式。針對像多臂賭博機這樣的問題，可以設計這樣的一個公式：

其中：

Cw為節點分數。
Cv為該節點總訪問數。
Pv為所有節點總訪問數。
C為比例系數。這個系數越大越注重探索，越小越注重利用。

這里的關鍵就在於如何調整比例系數了。一般需要用實驗來確定，參考的文章中提供的一個可選的參數是1.38，也就是求解C*sqrt(2)==1.96得到的數值。至於為什么是1.96，我也說不清楚……

總之，在這個策略下，你可以記下拉動拉桿的總次數，以此作為Pv。針對單個賭博機，記得到的硬幣為Cw，歷史拉動次數為Cv，那么拉動前給每個賭博機計算UCB值，選擇最大的那個去拉動就好了。當存在未拉動的賭博機，我們可以視為其UCB值無限大，這樣我們總是優先地去嘗試這些賭博機，畢竟它們有無限可能嘛。

將UCB算法與蒙特卡羅樹結合的UCT算法——直觀的解釋

出於賭博機的封裝機制，我們並不能看出拉動拉桿之后，賭博機內部如何進行運算。但是，黑白棋游戲中，我們是可以看到隨機落子的模擬過程的。於是我們就可以對UCB算法做一些改進，使得我們的模擬過程能記錄下來。這里就要用到樹，樹的節點代表一個棋盤的狀態，同時也記錄了這個狀態下落子的一方以及落子方的勝率。可以推知，游戲開局的根節點是白色的，因為下一手是黑方行動，根節點是黑色的。

問題來了，UCT對模擬的記錄是怎么樣的呢？如果對算法理解不透徹就按偽代碼搭代碼，就可能對偽代碼產生誤解。我和舍友都曾經對其產生過誤解，當時我們是這樣認為的：UCT算法提到了模擬和備份的概念，那么是不是就意味着，模擬過程中，每一步落子前，我們都考察UCB值最大的點，若這個點不在樹中，便生成一個勝率為0/0的節點。當我們經過約60多步的落子，完成棋盤，博弈樹上就會有一條長長的枝。最后，根據模擬結果，給枝上的節點記分，得到一串勝率為0/1或是1/1的節點，以此算作所謂的備份過程。

上圖是一個錯誤的理解，你可以看到，開局的首次模擬過程就開辟了一串節點，虛線指向的是還沒有開辟的可行節點

但實際上並非如此。對於這個令人困擾的問題，我在觀看了文首英文文獻中的圖片之后茅塞頓開。UCT的過程，實際上是這樣的：

首先，我們從以當前棋盤狀態對應的節點，作為博弈樹的根節點。
- 每次UCT搜索，看的是當前所到的節點，是不是尚未完全擴展的節點。這就好比在看，是否存在沒有拉動拉桿的賭博機。
- 如果這個節點是完全擴展的，那么我們就計算UCB值，選擇最大的那個往下走。
最終可能出現兩種可能：我們遇到了沒有完全擴展的節點，或者遇到了終局節點。
- 終局節點當然好說，就是直接沿着我們剛才來的路徑，一個一個節點備份棋局結果。
- 不然的話，我們就相當於發現了沒有拉動的賭博機。這時候就選一個拉下去，即以一個可行狀態出發，進行隨機模擬。這個模擬過程就是隨機在可行位置下不斷下子，直到棋盤結束。這個隨機過程中我們並不記錄任何東西。模擬的結果，從剛才生成的0/0節點開始，依次向上備份結果。

抽象地說（來自第一篇參考文章的注解），我們就是在找當前UCT樹的主路徑，然后取得主路徑新生成的尾節點，從這個尾節點出發進行模擬，備份得分的對象是新的主路徑。百度百科上有一張圖，很直觀地展現了什么是主路徑。

剛才說的是單次的UCT搜索。一次完整的UCT算法求解，是要在限定的時間內進行多次UCT搜索的。每次UCT搜索，都會改變博弈樹的結構，影響下一次UCT搜索的主路徑走向。而搜索得越多，結果也就越准確。

如果主路徑直達終局節點，那么當然就是直接備份結果。

這張圖是最常見的情況。在主路徑中發現了非全擴展節點，就為從可行節點中新開辟一個0/0節點（你可以看到，虛線還連着一個節點，如果下一次有主路徑通往這里，就會開辟它）。

模擬的結果，假定是黑方勝利，那么沿着主路徑從這個0/0節點回溯，一直到當前棋盤的根節點，都進行勝率的更新。

由此，就解釋清楚了UCT算法的過程。那么，具體到代碼，應該怎么寫呢？

UCT算法在代碼上的具體實現

先是數據結構：

class Node {
public:
    chesspos pos; //此狀態的落子位置，如果上一回合沒有落子，就是（-1，-1）
    int total, score; //節點的勝率信息
    int color; //落子的顏色
    Node* parent; 
    vector<Node*> child; 
    vector<chesspos> validPos; //生成每個節點的時，都保存了可下位置，這樣方便判斷是否完全擴展，也可以快速找到可擴展節點
    Node(chesspos p, int c, Node* par, vector<chesspos> v);
};

class Tree {
    Node *subroot, *tail; //一開始的時候想復用搜索樹，所以還寫了個root保存開局節點，但這實際上是不需要的，因為這個算法不復用搜索結果
    int ownColor; //本方的顏色，用於記錄勝率
public:
    //下面這些在后面細講
    Tree(int ownc); 
    Node* expend(Board board);//expend tail
    void nextnode(chesspos nextp, Board board); //includes nonexist node constuction
    Node* bestChild(Node * tarRoot, double cof);
    Board getTail(Board board);//tree policy
    void backup(int result);

    //下面的這兩個都不用管，是作業特殊要求的函數。
    void printInfo(); 
    void newTurn();
};

除了這些UCT樹用到的算法，還會用到Board類的simulate()。

關於初始化這樣的基本操作我們就跳過不提了，先來看看UCT算法的主要部分是怎么工作的：

//到自己的回合了...
//樹是Tree UCTtree
//當前棋盤是Board b
s = clock();
n = clock();
while ((int)(n - s)<4750) {
    UCTtree.backup(UCTtree.getTail(b).simulate());
    n = clock();
}
//根據搜索結果落子...

看起來有些簡單，因為最重要的過程被抽象到了一條語句里。我們一步步地來看。

首先是getTail，由於我們不在節點中保存棋盤，所以這個函數接收一個棋盤，這個棋盤的狀態等同於當前根節點代表的狀態。

Board Tree::getTail(Board board) {
    tail = subroot;
    while (!board.isEnd()) {
        int vs = tail->validPos.size(), cs = tail->child.size();
        if (vs != cs) {
            tail = expend(board);
            board.oneMove(tail->pos);
            break;
        }
        else {
            tail = bestChild(tail);
            board.oneMove(tail->pos);
        }
    }
    return board;
}

你可以看到，如果主路徑直達終局，那么就退出while，返回一個終局的棋盤。如果不是，也就是vs > cs的時候，就基於當前棋盤，擴展一個節點，然后根據這節點落子，最后返回棋盤。

getTail里有兩個函數沒有說，一個是expend，一個是bestChild。

Node* Tree::expend(Board board) {
    Node* newNode;
    vector<chesspos> possiblePos;
    bool matched;
    //以下的循環就是找出validPos中不在child的那些位置
    for (auto v : tail->validPos) {
        matched = false;
        for (auto c : tail->child) {
            if (v == c->pos) {
                matched = true;
                break;
            }
        }
        if (!matched) possiblePos.push_back(v);
    }
    int index = rand() % possiblePos.size();
    board.oneMove(possiblePos[index]);
    newNode = new Node(possiblePos[index], !(bool)tail->color, tail, board.getValidPos()); //你可以看到，節點在生成的時候就保留了可下位置。
    tail->child.push_back(newNode); //把新節點放入tail的子節點行列中。事實上，getTail里的tail = expend(board)是可以合並在expend里的，這就是具體實現細節的問題了。
    return newNode;
}

這個代碼應該很直觀了，就是為搜索樹擴展一個新的節點，然后棋盤相應地更新一下。

然后是bestChild。你可以看到比例系數cof是150，這個稍后會解釋。

Node* Tree::bestChild(Node *tarRoot = NULL, double cof = 150) {
    double argmax = -99999999, ucb;
    Node* best = NULL;
    if (tarRoot == NULL) tarRoot = subroot;
    for (auto c : tarRoot->child) {
        ucb = 1.0 * c->score / c->total + cof * sqrt(log(tarRoot->total) / c->total);
        if (ucb > argmax) {
            argmax = ucb;
            best = c;
        }
    }
    return best;
}

要注意的是，無論是自己還是對方，UCT公式是一樣的。如果在算對方的UCB值時加負號什么的，實測中發生的事，就是顯示自己的勝率為99.98%，但是瞬間歸零。因為在某些接近終盤的局面下，對方的選擇可能就將決定勝負歸誰。那么這個負號就是假設對手下在最壞位置，並且算出自己勝率。只要人家不傻，下在有利於他的位子，那么自己就絕對會輸，勝率也就歸零了。

以上就是getTail部分了，小結一下，getTail結束時候，我們就獲得了一個棋盤，這個棋盤是這么得到的：從游戲當前的棋盤開始，根據UCT樹的主路徑落子，要么下到游戲結束，要么下到出現了非全擴展節點。如果是后者，就隨機選一個之前沒試過的位置落子，然后相應地在樹上記錄這個新的節點。

getTail之后就是simulate了，這是Board類的功能，簡單看看就行：

int Board::simulate() {
    vector<pair<int, int>> aps;
    int score, tmpcolor = latestColor, tmpBoard[8][8];
    chesspos tmpStep = latestStep;
    memcpy(tmpBoard, chessboard, sizeof(chessboard)); //以上是備份當前棋盤。其實這個備份環節是出於調試的需要，實際上不會直接對本地棋盤這么調用，所以不備份或許也可以。
    while (!isEnd()) {
        randomMove(); 
    }
    score = calScore(); //保存分數
    memcpy(chessboard, tmpBoard, sizeof(chessboard)); //以下是恢復棋盤。
    latestColor = tmpcolor;
    latestStep = tmpStep;
    return score; //返回模擬結果
}

chesspos Board::randomMove()
{
    vector<chesspos> aps;
    int index;
    aps = getValidPos();
    index = rand() % aps.size();
    if (aps[index].first != -1) oneMove(aps[index]); //在oneMove里已經轉換了顏色
    else latestColor = !(bool)latestColor; //說明一下，getValidPos在無子可下的時候會返回一個(-1,-1)的位置。
    latestStep = aps[index];
    return latestStep; //其實不一定要return，這里是調試需要
}

simulate之后就是backup。

void Tree::backup(int result) {
    //simulate的結果通過正負號來記錄黑白子的勝利信息。
    int mod = result > 0 ? BLACK_WINS : WHITE_WINS;
    result = abs(result);
    while (tail != subroot) { 
        tail->total += 64;
        if (!(tail->color ^ mod)) tail->score += result;
        tail = tail->parent;
    }
    //由於之前的規划問題，這里還要再對subroot進行處理。如果每次轉移搜索樹的根節點的時候，都清除subroot的parent，那么就可以用while(tail)一步到位。
    tail->total += 64;
    if (!(tail->color ^ result)) tail->score += result; //這一行貌似可以不要，因為根節點的勝率不在計算的考慮范圍內。
}

這里有必要說明一下記分規則。在我自己的實驗過程中，設計了兩種計算分數的規則，一種是計算勝負，一種是計算終局棋盤本方剩余子數。

如果是計算勝負，那么主路徑的所有節點Cv+1，勝方顏色節點Cw+1，但負方不扣分。
如果是計算勝子，那么Cv+64，勝方Cw加棋盤上的本方子數，同樣的，負方不扣分。注意，不能Cv+32，然后Cw考慮負方扣分，這會導致奇奇怪怪的情況。

兩個記分規則會導致什么不同呢？

計算勝負，可以直觀地看到勝率信息，但是最終只是能贏，不能考慮贏多。此時的比例系數c照常為1.38
計算勝子，就根據勝子數量細分了勝率，可以追求更多的勝子。然而，Cv+64導致增長過快，1.38的比例系數會導致極為不平衡的利用，所以必須把c調大。我嘗試過從88.32到180的比例系數，但是由於時間上的限制，沒辦法清晰地展現出這些系數的不同。最終我采用了150，當然小一點也是沒問題的。

至此，UCT算法的主要過程就結束了。之后就是一些操作上的設計了。

//...搜索結束
//要獲得最佳節點，就把比例系數設為0，即完全利用，只看勝率了。
Node* best = UCTtree.bestChild(NULL, 0);
UCTtree.nextnode(best->pos, b);
//進行下一回合，輪到對手落子...

對了，nextnode是什么呢？

void Tree::nextnode(chesspos nextp, Board board) {
    for (auto c : subroot->child) {
        if (c->pos == nextp) {
            subroot = c;
            subroot->score = 0;
            subroot->total = 0;
            subroot->child.clear();
            return;
        }
    }
}

雖然nextnode很簡單，就是向下轉移根節點，但是注意到：

會不會出現我的目標節點並未被擴展出來？實際上不需要擔心這個，一個局面的可下位置至多不超過30多，而5秒已經可以達到800多次的UCT搜索，所以並沒有要為還沒擴展的節點考慮在樹上新生成節點。此外，bestChild也保證了只會在已擴展節點中選擇位置。
注意subroot->child.clear()，也就是每次轉移根節點，都不必要保存之前的搜索結果，因為這可能會妨礙最優子節點的判斷。而且，實際上搜索結果的復用效率很低，即使保存了也不會有很大的能力提升。

UCT的實戰效果怎么樣？

你可能注意到了，我沒有為動態申請的節點寫清除的函數。這意味着會占用很多內存——實測一局大概15M左右。你要是想自己寫清除的功能也沒問題。

接下來的圖片，都是與猴子（完全隨機落子）對戰的日志。

以上是計算勝負的情況下的結果。在第19步，就已經有1360步的搜索。在角位落子，顯示出了高勝率，所以角位的搜索次數也相對較多。程序最終選擇下在角位，而且勝率暫時為60%。

在計算勝負的策略中，算法在第47步確定自己將會勝利，勝率顯示為了100%。

當采用計算剩子的策略時，計算的就是棋盤剩子的期望值了。47的剩子以為着極大的獲勝可能，而且看AI的行為，已經逼得對方無棋可走，最終達成了完勝的局面。

甚至還出現了這樣的局面：對陣的是同學的剪枝算法，UCT算法是白方。雖然UCT算法很難招架，但是由於對方的一些失誤，UCT甚至在失去三個角位的情況下也達成了勝利！

關於UCT的總結

雖然UCT靠的是隨機模擬，但是靠着模擬次數足夠和UCB策略，也能有着很不錯的表現。
UCT算法是獨立於游戲本身的算法，只要有接口，大部分相似的游戲都可以使用UCT，比如五子棋，象棋等。
α-β剪枝是常用的算法，但是它需要針對游戲進行精細的估值。相比之下，雖然UCT算法可能打不過精細調參的剪枝算法，但是它只需要調一個比例系數，非常省事高效。
搜索次數也是限制UCT算法能力的一個因素。開局情況下只能搜索800次，只有到后期才可能上千上萬。如果開局不好，UCT算法可能會無法給自己布好局，從而早早地給出低勝率。當然了，對付猴子還是綽綽有余的。

可以跑起來的代碼

一個可以跑起來的代碼會給我很大幫助，這在我研究UCT的時候就是這么想的。但是下面的代碼是我自己本地調試，寫的比較亂，很多沒有使用的冗余功能，主函數也沒有整理，而且整體代碼不是最新版本，有興趣的話看看就好，畢竟上面已經整理好相關代碼了。

  1 #include <cstdio>
  2 #include <iostream>
  3 #include <vector>
  4 #include <ctime>
  5 #include <string>
  6 #include <fstream>
  7 #include <omp.h>
  8 //#include <cmath>
  9 #define BLACK_WINS 0
 10 #define WHITE_WINS 1
 11 //#define TESTING
 12 using namespace std;
 13 typedef pair<int, int> chesspos;
 14 
 15 class Node {
 16 public:
 17     chesspos pos;
 18     int total, score; // long long?
 19     int color;
 20     Node* parent;
 21     vector<Node*> child;
 22     vector<pair<int, int>> validPos;
 23     Node(chesspos p, int c, Node* par, vector<chesspos> v);
 24 };
 25 class Board {
 26     int chessboard[8][8], latestColor;
 27     chesspos latestStep;
 28 public:
 29     Board();
 30     int calScore();
 31     bool isEnd();
 32     bool notConj(chesspos a, chesspos b);
 33     bool search(chesspos p, int color, int d);
 34     void rev(chesspos p, int color, int d);
 35     void oneMove(chesspos p);
 36     vector<chesspos> getValidPos(int, chesspos);
 37     chesspos randomMove();
 38     int simulate();
 39     void graphBoard();
 40     void printScore();
 41 };
 42 
 43 class Tree {
 44     Node *root, *subroot, *tail;
 45     int ownColor;
 46 public:
 47     Tree(int ownc, Board board);
 48     Node* expend(Board board);//expend tail
 49     void nextnode(chesspos nextp, Board board); //includes nonexist node constuction
 50     Node* bestChild(Node * tarRoot, double cof);
 51     Board getTail(Board board);//tree policy
 52     void backup(int result);
 53     void printInfo();
 54     void newTurn();
 55 };
 56 int dr[8] = { 0,0,1,1,1,-1,-1,-1 };
 57 int dc[8] = { 1,-1,1,0,-1,1,0,-1 };
 58 
 59 int main() {
 60     srand(time(NULL));
 61     int x, y;
 62     time_t s, n;
 63     Board b=Board();
 64     Tree UCTtree(0, b);
 65     Node* best;
 66     int res, searchCount ,total = 0;
 67     chesspos r;
 68     while (!b.isEnd()) {
 69         s = clock();
 70         n = clock();
 71         searchCount = 0;
 72         while ((int)(n - s)<4750) {
 73             //Board t = UCTtree.getTail(b);
 74             //res = t.simulate();
 75             UCTtree.backup(UCTtree.getTail(b).simulate());
 76             //printf("%d\n", i);
 77             n = clock();
 78             searchCount++;
 79         }
 80         n = clock();
 81         printf("time use:%d\nSearch times:%d\n", (int)(n - s), searchCount);
 82         UCTtree.printInfo();
 83         best = UCTtree.bestChild(NULL, 0);
 84         printf("win rate:%lf\n", 1.0 * best->score / best->total * 64);
 85         b.oneMove(best->pos);
 86         total++;
 87         printf("total:%d\n", total);
 88         b.graphBoard();
 89         if (!b.isEnd()) {
 90             UCTtree.nextnode(best->pos, b);
 91             UCTtree.printInfo();
 92             best = UCTtree.bestChild(NULL, 0);
 93             printf("win rate:%lf\n", 1.0 * best->score / best->total * 64);
 94             //cin >> x >> y;
 95             //b.oneMove(r);    //if you want to see how monkey moves, delete this two lines and use the next line
 96             r = b.randomMove();
 97             total++;
 98             printf("The monkey choose to move in (%d,%d)\n", r.first, r.second);
 99             printf("total:%d\n", total);
100             UCTtree.nextnode(r, b);
101             b.graphBoard();
102         }
103         system("pause");
104     }
105     b.printScore();
106     system("pause");
107     //take white as owncolor, monkey mode only
108     total = 0;
109     b = Board();
110     UCTtree.newTurn();
111     while (!b.isEnd()) {
112         UCTtree.printInfo();
113         best = UCTtree.bestChild(NULL, 0);
114         printf("win rate:%lf\n", 1.0 * best->score / best->total * 64);
115         r = b.randomMove();
116         total++;
117         printf("The monkey choose to move in (%d,%d)\n", r.first, r.second);
118         printf("total:%d\n", total);
119         UCTtree.nextnode(r, b);
120         b.graphBoard();
121         if (!b.isEnd()) {
122             s = clock();
123             n = clock();
124             searchCount = 0;
125             while ((int)(n - s)<4750) {
126                 res = UCTtree.getTail(b).simulate();
127                 UCTtree.backup(res);
128                 //printf("%d\n", i);
129                 n = clock();
130                 searchCount++;
131             }
132             n = clock();
133             printf("time use:%d\nSearch times:%d\n", (int)(n - s), searchCount);
134             UCTtree.printInfo();
135             best = UCTtree.bestChild(NULL, 0);
136             printf("win rate:%lf\n", 1.0 * best->score / best->total * 64);
137             b.oneMove(best->pos);
138             UCTtree.nextnode(best->pos, b);
139             total++;
140             printf("total:%d\n", total);
141             b.graphBoard();
142             system("pause");
143         }
144     }
145     b.printScore();
146     system("pause");
147     return 0;
148 }
149 
150 Board::Board()
151 {
152     for (int i = 0; i < 8; i++)
153         for (int j = 0; j < 8; j++)
154             chessboard[i][j] = -1;
155     chessboard[3][3] = 0;
156     chessboard[4][4] = 0;
157     chessboard[3][4] = 1;
158     chessboard[4][3] = 1;
159     latestColor = 1;//gameStartRoot is white, next and first move is black
160     latestStep = make_pair(-1, -1);
161 }
162 
163 int Board::calScore() {
164     int c[2] = { 0 };
165     for (int i = 0; i < 8; i++)
166         for (int j = 0; j < 8; j++)
167             if (chessboard[i][j] >= 0) c[chessboard[i][j]]++;
168 #ifdef TESTING
169     return c[0] - c[1];
170 #else
171     if (c[0] == c[1]) return 0;
172     else if (c[0] > c[1]) return c[0];
173     else if (c[0] < c[1]) return -1 * c[1];
174     //return c[0] > c[1] ? BLACK_WINS : WHITE_WINS;
175 #endif // TESTING
176 
177 }
178 
179 bool Board::isEnd() {
180     if (getValidPos(1, make_pair(-1, -1))[0].first == -1 && getValidPos(0, make_pair(-1, -1))[0].first == -1) return true;
181     return false;
182 }
183 
184 bool Board::notConj(chesspos a, chesspos b) {
185     if (abs(a.first - b.first) + abs(a.second - b.second) == 1) return false;
186     else return true;
187 }
188 
189 bool Board::search(chesspos p, int color, int d)
190 {
191     int r = p.first + dr[d], c = p.second + dc[d];
192     if (chessboard[r][c] == color) return false; //diff color should be in the middle
193     while (0 <= r && r <= 7 && 0 <= c && c <= 7) {
194         if (chessboard[r][c] == -1) return false;
195         else if (chessboard[r][c] == color) return true;
196         else {
197             r += dr[d];
198             c += dc[d];
199         }
200     }
201     return false;
202 }
203 
204 void Board::rev(chesspos p, int color, int d)
205 {
206     int r = p.first + dr[d], c = p.second + dc[d], oppcolor = !(bool)color;
207     while (0 <= r && r <= 7 && 0 <= c && c <= 7 && chessboard[r][c] == oppcolor) {
208         chessboard[r][c] = color;
209         r += dr[d];
210         c += dc[d];
211     }
212 }
213 
214 void Board::oneMove(chesspos p)
215 {
216     latestColor = !(bool)latestColor;
217     if (p.first != -1) {
218         chessboard[p.first][p.second] = latestColor;
219         for (int d = 0; d < 8; d++) { //flip in 8 direction
220             if (search(p, latestColor, d)) {
221                 rev(p, latestColor, d);
222             }
223         }
224     }
225     latestStep = p;
226 }
227 
228 vector<chesspos> Board::getValidPos(int targetColor = -1, chesspos lstep = make_pair(233, 233))
229 {
230     vector<chesspos> result;
231     chesspos pos;
232     if (targetColor == -1) targetColor = !(bool)latestColor; //next step is for the opp
233     if (lstep == make_pair(233, 233)) lstep = latestStep;
234     #pragma omp parallel for
235     for (int k = 0; k < 64; k++) {
236         int i = k / 8;
237         int j = k % 8;
238         if (chessboard[i][j] == -1) {
239             pos = make_pair(i, j);
240             for (int d = 0; d < 8; d++) {
241                 if (notConj(pos, lstep) && search(pos, targetColor, d)) {
242                     result.push_back(pos);
243                     break;
244                 }
245             }
246         }    
247     }
248     if (result.size() == 0) result.push_back(make_pair(-1, -1));
249     return result;
250 }
251 
252 chesspos Board::randomMove()
253 {
254     vector<pair<int, int>> aps;
255     int index;
256     aps = getValidPos();
257     index = rand() % aps.size();
258     if (aps[index].first != -1) oneMove(aps[index]); //in this func color has been flipped
259     else latestColor = !(bool)latestColor;
260     latestStep = aps[index];
261     return latestStep;
262 }
263 
264 int Board::simulate() {
265     vector<pair<int, int>> aps;
266     int index, tmpcolor = latestColor, tmpBoard[8][8];
267     chesspos tmpStep = latestStep;
268     memcpy(tmpBoard, chessboard, sizeof(chessboard)); //backup
269     while (!isEnd()) {
270         randomMove();
271 #ifdef TESTING
272         graphBoard(); //
273 #endif // TESTING
274     }
275     index = calScore(); //for temp use
276     memcpy(chessboard, tmpBoard, sizeof(chessboard)); // reset to initial state
277     latestColor = tmpcolor;
278     latestStep = tmpStep;
279     return index;
280 }
281 
282 void Board::graphBoard() {
283     int markboard[8][8];
284     vector<chesspos> aps = getValidPos();
285     memcpy(markboard, chessboard, sizeof(chessboard));
286     if (latestStep.first != -1) markboard[latestStep.first][latestStep.second] = 2;
287     for (auto p : aps) markboard[p.first][p.second] = 3;
288     printf("  0 1 2 3 4 5 6 7\n");
289     for (int i = 0; i < 8; i++) {
290         printf(" %d", i);
291         for (int j = 0; j < 8; j++) {
292             switch (markboard[i][j]) {
293             case 0:printf("●"); break;
294             case 1:printf("○"); break;
295             case 2: {
296                 if (latestColor == 0) printf("★");
297                 else printf("☆");
298                 break;
299             }
300             case 3:printf("♂"); break;
301             case -1: {
302                 if (notConj(latestStep, make_pair(i, j))) printf("  ");
303                 else printf("×");
304                 break;
305             }
306             }
307         }
308         printf("||\n");
309     }
310     printf("=======================\n");
311 }
312 
313 void Board::printScore() {
314     int c[2] = { 0 };
315     for (int i = 0; i < 8; i++)
316         for (int j = 0; j < 8; j++)
317             if (chessboard[i][j] != -1) c[chessboard[i][j]]++;
318     printf("BLACK %d : %d WHITE\n", c[0], c[1]);
319 }
320 Node::Node(chesspos p, int c, Node * par, vector<chesspos> v)
321 {
322     pos = p;
323     color = c;
324     parent = par;
325     total = 0;
326     score = 0;
327     validPos = v;
328 }
329 
330 Tree::Tree(int ownc, Board board)
331 {
332     root = new Node(make_pair(-1, -1), 1, NULL, board.getValidPos());
333     subroot = root;
334     tail = root;
335     ownColor = ownc;
336 }
337 
338 Node* Tree::expend(Board board) {
339     Node* newNode;
340     vector<chesspos> possiblePos;
341     bool matched;
342     for (auto v : tail->validPos) {
343         matched = false;
344         for (auto c : tail->child) {
345             if (v == c->pos) {
346                 matched = true;
347                 break;
348             }
349         }
350         if (!matched) possiblePos.push_back(v);
351     }
352     int index = rand() % possiblePos.size();
353     board.oneMove(possiblePos[index]);
354     newNode = new Node(possiblePos[index], !(bool)tail->color, tail, board.getValidPos());
355     tail->child.push_back(newNode);
356     return newNode;
357 }
358 
359 void Tree::nextnode(chesspos nextp, Board board) {
360     for (auto c : subroot->child) {
361         if (c->pos == nextp) {
362             subroot = c;
363             return;
364         }
365     }
366     //no child matched
367     board.oneMove(nextp);
368     Node *newNode = new Node(nextp, !(bool)subroot->color, subroot, board.getValidPos());
369     subroot = newNode;
370 }
371 
372 Node* Tree::bestChild(Node *tarRoot = NULL, double cof = 150) {
373     double argmax = -99999999, ucb;
374     Node* best = NULL;
375     if (tarRoot == NULL) tarRoot = subroot;
376     for (auto c : tarRoot->child) {
377         ucb = 1.0 * c->score / c->total + cof * sqrt(log(tarRoot->total) / c->total);
378         if (ucb > argmax) {
379             argmax = ucb;
380             best = c;
381         }
382     }
383     return best;
384 }
385 Board Tree::getTail(Board board) {
386     tail = subroot;
387     while (!board.isEnd()) {
388         int vs = tail->validPos.size(), cs = tail->child.size();
389         if (vs != cs) {
390             tail = expend(board);
391             board.oneMove(tail->pos);
392             break;
393         }
394         else {
395             tail = bestChild(tail);
396             board.oneMove(tail->pos);
397         }
398     }
399     return board;
400 }
401 
402 void Tree::backup(int result) {
403     //if a subroot is avoidable, then use while(tail) for root node's parent is NULL
404     int mod = result > 0 ? BLACK_WINS : WHITE_WINS;
405     result = abs(result);
406     while (tail != subroot) {
407         tail->total += 64;
408         if (!(tail->color ^ mod)) tail->score += result;
409         //tail->score += result * mod;
410         //mod *= -1;
411         tail = tail->parent;
412     }
413     tail->total += 64;
414     //tail->score += result * mod;
415     if (!(tail->color ^ result)) tail->score += result;
416 }
417 
418 void Tree::printInfo() {
419     printf("subroot:(%d,%d)\n", subroot->pos.first, subroot->pos.second);
420     for (auto c : subroot->child) {
421         printf("-child:(%d,%d), score:%d, total:%d\n", c->pos.first, c->pos.second, c->score, c->total);
422     }
423     Node* n = bestChild(subroot, 0);
424     chesspos p = n == NULL ? make_pair(8, 8) : n->pos;
425     printf("bestchild:(%d,%d)\n",p.first,p.second);
426 }
427 
428 void Tree::newTurn() {
429     subroot = root;
430     ownColor = !(bool)ownColor;
431 }

View Code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 C++ 黑白棋AI minimax+alphabeta剪枝 C語言-黑白棋（人機對戰）從0開始 Java實習黑白棋黑白棋子的移動(分治) JAVA——黑白棋簡單實現【例7.6】黑白棋子的移動 python3+tkinter實現的黑白棋，代碼完整 100%能運行算法實驗：分治法合並排序（C++）如何設計基本的正交實驗設 c++實驗二（2）