Linux MPI集群配置


參考文檔:Linux下MPI並行編程環境搭建配置

 

MPI是一種並行計算架構,MPICH是MPI的一種實現,本集群使用虛擬機安裝,操作系統是ubuntu14.04,使用三台機器,用戶名都是ubuntu,機器名分別是ub0, ub1, ub2

  • 安裝MPICH
    1. $ tar -xzvf soft/mpich-3.0.4.tar.gz
      $ cd mpich-3.0.4/
      $ ./configure --prefix=/usr/local/mpich
      $ make && sudo make install

      安裝后加入環境變量到/etc/profile文件,並執行 source /etc/profile,追加內容到/etc/profile

    2. PATH=$PATH:/usr/local/mpich/bin
      MANPATH=$MANPATH:/usr/local/mpich/man
      export PATH MANPATH
  • 單節點測試
    • 復制源代碼包下的examples目錄到安裝目錄下
    1. cp -r examples/ /usr/local/mpich

      執行

      mpirun -np 10 ./examples/cpi

      輸出結果如下:

    2. Process 0 of 10 is on ub0
      Process 9 of 10 is on ub0
      Process 1 of 10 is on ub0
      Process 4 of 10 is on ub0
      Process 5 of 10 is on ub0
      Process 7 of 10 is on ub0
      Process 2 of 10 is on ub0
      Process 3 of 10 is on ub0
      Process 6 of 10 is on ub0
      Process 8 of 10 is on ub0
      
      pi is approximately 3.1415926544231256, Error is 0.0000000008333325
      wall clock time = 0.020644
  • 集群配置
    • 需要先配置ssh免密碼登錄,把ub0(機器名)當作master node,也即主節點,其他是slave node,也即從節點。配置免密碼ssh登錄的步驟
      • 你需要把主節點的公鑰分別發送給從節點,這樣主節點登錄從節點才是可信任的,就不用密碼,以配置ub0的公鑰給ub1為例
      • ub0產生公鑰
      • $ ssh-keygen -t rsa

        一路enter就行了

      • 把ub0的/home/ubuntu/.ssh/id_rsa.pub 發送到ub1的/home/ubuntu/.ssh/下,如果沒有/home/ubuntu/.ssh/就mkdir .ssh
      • 在ub1的/home/ubuntu/.ssh/下執行
      • $ cat id_rsa.pub >> authorized_keys
      • 嘗試ub0 ssh到ub1試試,看看是否成功設置無密碼登錄,如果成功,就繼續下一個節點吧
    • 復制編譯程序到其他機器上面,這樣就不用在其他機器上進行源碼編譯mpich,節省了時間
    1. scp -r mpich ub1:/usr/local/
      scp -r mpich ub2:/usr/local/
    • 在ub0, ub1, ub2的/etc/hosts上追加
    • 192.168.0.2 ub0
      192.168.0.3 ub1
      192.168.0.4 ub2

      注意,三台機器的/etc/hosts都要追加

    • 把主節點的/usr/local/mpich/example/cpi這個計算圓周率的可執行文件復制到/home/ubuntu目錄下,並且發送到ub1和ub2的/home/ubuntu目錄
    • 在主節點的/home/ubuntu目錄下增加servers文件,記錄集群的機器名和對應的進程數
    • ub0:2
      ub1:2
      ub2:2
    • 在ub0的/home/ubuntu目錄下執行
    • $ mpiexec -n 10 -f servers ./cpi

      你就可以看到下面的結果

    • Process 0 of 10 is on ub0
      Process 1 of 10 is on ub1
      Process 4 of 10 is on ub0
      Process 5 of 10 is on ub2
      Process 6 of 10 is on ub1
      Process 7 of 10 is on ub2
      Process 8 of 10 is on ub0
      Process 9 of 10 is on ub1
      Process 2 of 10 is on ub2
      Process 3 of 10 is on ub1
      pi is approximately 3.1415926544231256, Error is 0.0000000008333325
      wall clock time = 0.018768


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM