ubuntu18.04搭建k8s集群


上個月為小組搭建一個k8s的nvidia gpu集群,在此記錄一下,以免以后忘記。

本次搭建采用的ubuntu18.04 server ,docker版本采用的19.03.2,k8s版本是1.15.2。

name  version
ubuntu server     18.04          
docker 19.03.2
k8s       1.15.2

 

 

 

 

 

搭建集群之前需要安裝nvidia顯卡驅動,這里就不在贅述如何安裝驅動。

集群需要設置固定ip,dns,否則容器可能不能訪問外網。

通過shell腳本文件自動安裝,install.sh文件如下:

 1 #!/bin/bash
 2 #安裝ftp客戶端
 3 sudo apt-get install lftp
 4 #修改時區
 5 ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
 6 bash -c "echo 'Asia/Shanghai' > /etc/timezone"
 7 
 8 #替換apt源為阿里源,先備份
 9 echo "替換apt源為阿里源"
10 sudo mv /etc/apt/sources.list /etc/apt/sources.list.bak
11 sudo rm -f /etc/apt/sources.list.save
12 sudo cp -f sources.list /etc/apt
13 sudo apt-get update
14 
15 #安裝docker
16 sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
17 curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
18 sudo add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
19 sudo apt-get update
20 sudo apt-get install -y docker-ce=5:19.03.2~3-0~ubuntu-bionic docker-ce-cli=5:19.03.2~3-0~ubuntu-bionic
21 
22 #安裝nvidia-container,請確保已經安裝了nvidia顯卡驅動
23 distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
24 curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
25 curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
26 sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
27 apt-get install -y nvidia-container-runtime
28 
29 #docker配置文件
30 mkdir -p /etc/docker
31 cp -f daemon.json /etc/docker
32 systemctl daemon-reload
33 systemctl restart docker
34 
35 #安裝k8s組件
36 curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
37 echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
38 sudo apt-get update
39 sudo apt install -y kubelet=1.15.2-00 kubeadm=1.15.2-00 kubectl=1.15.2-00
40 sudo apt-mark hold kubelet=1.15.2-00 kubeadm=1.15.2-00 kubectl=1.15.2-00
41 cp -f 10-kubeadm.conf /etc/systemd/system/kubelet.service.d/
42 
43 #dns設置
44 cp -f resolved.conf /etc/systemd/resolved.conf
45 systemctl restart systemd-resolved

以上就是安裝腳本,其中阿里apt源文件如下:

#sources.list
deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse

docker daemon.json文件如下(如果沒有GPU,將default-runtimte以及runtimes這兩項刪除):

{
    "exec-opts": ["native.cgroupdriver=systemd"],
    "registry-mirrors":["http://hub-mirror.c.163.com","https://registry.docker-cn.com","https://docker.mirrors.ustc.edu.cn","https://pee6w651.mirror.aliyuncs.com"],
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }   
    }   
}

kubeadm的配置文件10-kubeadm.conf如下

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice" 
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

ubuntu18.04 靜態ip設置通過netplan方式,文件為50-cloud-init.yaml,格式如下:

# This file is generated from information provided by
# the datasource.  Changes to it will not persist across an instance.
# To disable cloud-init's network configuration capabilities, write a file
# /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:
# network: {config: disabled}
network:
    ethernets:
        enp4s0:
            dhcp4: no
            addresses: [10.254.18.6/24]
            gateway4: 10.254.18.1 
    version: 2

dns配置文件resolved.conf,格式如下:

#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See resolved.conf(5) for details

[Resolve]
DNS=192.168.110.213 114.114.114.114
#FallbackDNS=
#Domains=
LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes
#DNSStubListener=yes

將上述shell腳本文件install.sh、阿里源sources.list文件、docker的daemon.json文件、靜態ip設置文件50-cloud-init.yaml、dns配置文件resolved.conf放在同一目錄,然后運行bash install.sh即可自動安裝。

如果需要安裝其他版本軟件,修改腳本文件即可。

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

以上步驟需要在每台機器上面執行,如何初始化k8s集群,以及如何添加節點到k8s集群中,可以根據https://blog.csdn.net/shykevin/article/details/98811021文章進行操作,但是文章中有一個地方需要注意,

sudo kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.15.2 --pod-network-cidr=192.169.0.0/16

 這里的pod-network-cidr使用的192.169.0.0,所以在添加calico網絡插件的時候,需要修改calico配置文件(http://mirror.faasx.com/k8s/calico/v3.3.2/calico.yaml

- name: CALICO_IPV4POOL_CIDR
  value: "192.168.0.0/16"

修改為:

- name: CALICO_IPV4POOL_CIDR
  value: "192.169.0.0/16"

否則,容器將無法訪問外網。

gpu插件采用的是nvidia-device-plugin,如下:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/master/nvidia-device-plugin.yml

參考文檔如下:https://feisky.gitbooks.io/kubernetes/content/plugins/device.html

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM