相關軟件
1、kubeadm
安裝步驟
apt-get update
1、禁用所有交換分區
swapoff -a
/etc/fstab
可以用free命令查看禁用情況
root@gpu-10-0-1-24:~# free total used free shared buff/cache available Mem: 528016312 6131652 343432968 6595072 178451692 512492696 Swap: 0 0 0
2、關閉防火牆
systemctl stop firewalld systemctl disable firewalld
3、禁用SELinux
setenforce 0
安裝網絡插件flannel
kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=10.0.1.18 --kubernetes-version=v1.11.1 --ignore-preflight-errors=all //--skip-preflight-checks選項已經棄用
報錯
[preflight] Activating the kubelet service failure loading ca certificate: couldn't load the private key file /etc/kubernetes/pki/ca.key: open /etc/kubernetes/pki/ca.key: no such file or directory
把自定義pki密鑰拷到對應目錄下。
sudo: unable to resolve host gpu-10-0-1-18
在/etc/hosts文件中加上主機名映射。
getenforce
添加node節點
kubeadm join 10.0.0.39:6443 --token 4g0p8w.w5p29ukwvitim2ti --discovery-token-ca-cert-hash sha256:21d0adbfcb409dca97e655641573b2ee51c 77a212f194e20a307cb459e5f77c8
kubeadm token list
kubeadm token create --print-join-command
apt-get update && apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt-get update apt-get install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl
新加的節點,get nodes的ROLES為<none>
kubectl get pods -n kube-system | grep flannel
kubectl get pods -n kube-system -o wide | grep gpu-10-0-1-24
參考鏈接
https://tomoyadeng.github.io/blog/2018/10/12/k8s-in-ubuntu18.04/index.html
kubeadm token list empty:
https://www.serverlab.ca/tutorials/containers/kubernetes/how-to-add-workers-to-kubernetes-clusters/
https://stackoverflow.com/questions/51380934/unable-to-connect-worker-node-to-kubernetes-cluster
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
join顯示成功,但是get nodes沒有:
https://github.com/kubernetes/kubernetes/issues/61224
The connection to the server localhost:8080 was refused - did you specify the right host or port?
https://www.jianshu.com/p/6fa06b9bbf6a
Attempting to reclaim ephemeral-storage
ImagePullBackOff
kubectl -n kube-system logs kube-flannel-ds-jpp96 -c install-cni
node ready並不代表網絡插件flannel通了。
flannel也是在鏡像中啟動的。
k8s可以有多個master節點。
給節點添加role標簽
kubectl label node k8s-node1 node-role.kubernetes.io/worker=worker
systemctl restart kubelet會觸發聯網拉鏡像
root@cpu-10-0-3-9:~# ks init xps-kubeflow INFO Using context "kubernetes-admin@kubernetes" from kubeconfig file "/root/.kube/config" INFO Creating environment "default" with namespace "default", pointing to "version:v1.8.0" cluster at address "https://10.0.3.9:6443" INFO Generating ksonnet-lib data at path '/root/xps-kubeflow/lib/ksonnet-lib/v1.8.0'
root@cpu-10-0-3-9:~/xps-k8s# kubectl create -f xps_crd.yaml customresourcedefinition.apiextensions.k8s.io/xps.tencent.com created
kubectl get crd
Pod sandbox changed, it will be killed and re-created.
docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.11
emptydir只在pod范圍內共享 所以只要保證一個pod一個容器就行
k8s默認不會調度到master節點上
kubectl taint nodes --all node-role.kubernetes.io/master-
查看所有mxjobs
kubectl get mxjobs.kubeflow.org
分配pod到node:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity