原文作者:Zhangguanzhang
原文鏈接:http://zhangguanzhang.github.io/2019/11/24/kubeadm-base-use/
一:系統基礎配置
這里我們認為您的系統是最新且最小化安裝的。
1. 確保時間統一
yum install chrony -y systemctl enable chronyd && systemctl restart chronyd
2:關閉交換分區
swapoff -a && sysctl -w vm.swappiness=0
sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab
3:關閉防火牆以及selinux
systemctl stop firewalld && systemctl disable firewalld
setenforce 0
sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config
4. 關閉NetworkManager,如果ip不是通過NetworkManager納管的,建議關閉,然后使用network;這里我們依然使用的是network
systemctl disable NetworkManager && systemctl stop NetworkManager
systemctl restart network
5. 安裝epel源,並且替換為阿里雲的epel源
yum install epel-release wget -y
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
6. 安裝依賴組件
yum install -y \
curl \
git \
conntrack-tools \
psmisc \
nfs-utils \
jq \
socat \
bash-completion \
ipset \
ipvsadm \
conntrack \
libseccomp \
net-tools \
crontabs \
sysstat \
unzip \
iftop \
nload \
strace \
bind-utils \
tcpdump \
telnet \
lsof \
htop
二:集群kube-proxy使用ipvs模式需要開機加載下列模塊
這里按照規范使用systemd-modules-load
來加載而不是在/etc/rc.local
里寫modprobe
vim /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
br_netfilter
systemctl daemon-reload && systemctl enable --now systemd-modules-load.service
確認內核加載模塊
[root@k8s-m1 ~]# lsmod | grep ip_v ip_vs_sh 12688 0 ip_vs_wrr 12697 0 ip_vs_rr 12600 0 ip_vs 145497 6 ip_vs_rr,ip_vs_sh,ip_vs_wrr nf_conntrack 139264 1 ip_vs libcrc32c 12644 3 xfs,ip_vs,nf_conntrack
三: 設定系統參數
所有機器
需要設定/etc/sysctl.d/k8s.conf
的系統參數,目前對ipv6支持不怎么好,所以里面也關閉ipv6了。
cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 # 要求iptables不對bridge的數據進行處理 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-arptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 vm.overcommit_memory=1 vm.panic_on_oom=0 EOF
如果kube-proxy使用ipvs的話為了防止timeout需要設置下tcp參數
cat <<EOF >> /etc/sysctl.d/k8s.conf # https://github.com/moby/moby/issues/31208 # ipvsadm -l --timout # 修復ipvs模式下長連接timeout問題 小於900即可 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10 EOF sysctl --system
優化設置 journal 日志相關,避免日志重復搜集,浪費系統資源。修改systemctl啟動的最小文件打開數量。關閉ssh反向dns解析
# 下面兩句apt系列系統沒有,執行不影響 sed -ri 's/^\$ModLoad imjournal/#&/' /etc/rsyslog.conf sed -ri 's/^\$IMJournalStateFile/#&/' /etc/rsyslog.conf sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config
文件最大打開數,按照規范,在子配置文件寫
cat>/etc/security/limits.d/kubernetes.conf<<EOF * soft nproc 131072 * hard nproc 131072 * soft nofile 131072 * hard nofile 131072 root soft nproc 131072 root hard nproc 131072 root soft nofile 131072 root hard nofile 131072 EOF
docker官方的內核檢查腳本建議(RHEL7/CentOS7: User namespaces disabled; add 'user_namespace.enable=1' to boot command line)
,如果是yum系列的系統使用下面命令開啟,
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
四: 安裝docker
檢查系統內核和模塊是否適合運行 docker (僅適用於 linux 系統),該腳本可能因為牆的原因無法生成,可以先去掉重定向看看能不能訪問到腳本
curl -s https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh > check-config.sh bash ./check-config.sh
現在docker存儲驅動都是使用的overlay2(不要使用devicemapper,這個坑非常多),我們重點關注overlay2是否不是綠色
這里我們使用年份命名版本的docker-ce,假設我們要安裝v1.18.5
的k8s,我們去https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG
里進對應版本的CHANGELOG-1.18.md
里搜The list of validated docker versions remain
查找官方驗證過的docker版本,docker版本不一定得在列表里,實際上測試過19.03也能使用(19.03+修復了runc的一個性能bug),這里我們使用docker官方的安裝腳本安裝docker(該腳本支持centos和ubuntu).
export VERSION=19.03 curl -fsSL "https://get.docker.com/" | bash -s -- --mirror Aliyun
所有機器
配置加速源並配置docker的啟動參數使用systemd,使用systemd是官方的建議,詳見 https://kubernetes.io/docs/setup/cri/
mkdir -p /etc/docker/ cat>/etc/docker/daemon.json<<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "bip": "169.254.123.1/24", "oom-score-adjust": -1000, "registry-mirrors": [ "https://fz5yth0r.mirror.aliyuncs.com", "https://dockerhub.mirrors.nwafu.edu.cn/", "https://mirror.ccs.tencentyun.com", "https://docker.mirrors.ustc.edu.cn/", "https://reg-mirror.qiniu.com", "http://hub-mirror.c.163.com/", "https://registry.docker-cn.com" ], "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ], "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "3" } } EOF
Live Restore Enabled
這個千萬別開,某些極端情況下容器Dead狀態之類的必須重啟docker daemon才能解決,開了就只能重啟機器解決了
復制補全腳本
cp /usr/share/bash-completion/completions/docker /etc/bash_completion.d/
啟動docker並看下信息是否正常
systemctl enable --now docker docker info
五:kube-nginx部署
這里我們使用nginx實現local proxy來玩,因為localproxy是每台機器上的,可以不用SLB和無視在雲上vpc里無法使用vip的限制,需要每個機器上運行nginx實現
每台機器配置hosts
[root@k8s-m1 src]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 apiserver.k8s.local 192.168.50.101 apiserver01.k8s.local 192.168.50.102 apiserver02.k8s.local 192.168.50.103 apiserver03.k8s.local 192.168.50.101 k8s-m1 192.168.50.102 k8s-m2 192.168.50.103 k8s-m3 192.168.50.104 k8s-node1 192.168.50.105 k8s-node2 192.168.50.106 k8s-node3
每台機器生成nginx配置文件,上面的三個hosts可以不寫,寫下面配置文件里域名寫ip即可,但是這樣更改ip需要重新加載。這里我跟原作者不一樣的,是我自己手動編譯nginx來做的。
mkdir -p /etc/kubernetes [root@k8s-m1 src]# cat /etc/kubernetes/nginx.conf user nginx nginx; worker_processes auto; events { worker_connections 20240; use epoll; } error_log /var/log/kube_nginx_error.log info; stream { upstream kube-servers { hash consistent; server apiserver01.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s; server apiserver02.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s; server apiserver03.k8s.local:6443 weight=5 max_fails=1 fail_timeout=3s; } server { listen 8443 reuseport; proxy_connect_timeout 3s; # 加大timeout proxy_timeout 3000s; proxy_pass kube-servers; } }
因為localproxy是每台機器上的,可以不用SLB和vpc無法使用vip的限制,這里我們編譯安裝kube-nginx;所有機器都需要安裝
yum install gcc gcc-c++ -y groupadd nginx useradd -r -g nginx nginx wget http://nginx.org/download/nginx-1.16.1.tar.gz -P /usr/local/src/ cd /usr/local/src/ tar zxvf nginx-1.16.1.tar.gz cd nginx-1.16.1/ ./configure --with-stream --without-http --prefix=/usr/local/kube-nginx --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module make && make install #編寫systemd啟動 [root@k8s-m1 src]# cat /usr/lib/systemd/system/kube-nginx.service [Unit] Description=kube-apiserver nginx proxy After=network.target After=network-online.target Wants=network-online.target [Service] Type=forking ExecStartPre=/usr/local/kube-nginx/sbin/nginx -c /etc/kubernetes/nginx.conf -p /usr/local/kube-nginx -t ExecStart=/usr/local/kube-nginx/sbin/nginx -c /etc/kubernetes/nginx.conf -p /usr/local/kube-nginx ExecReload=/usr/local/kube-nginx/sbin/nginx -c /etc/kubernetes/nginx.conf -p /usr/local/kube-nginx -s reload PrivateTmp=true Restart=always RestartSec=5 StartLimitInterval=0 LimitNOFILE=65536 [Install] WantedBy=multi-user.target systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx
六: kubeadm部署
1. 配置kubernetes阿里雲的源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg EOF
2. master部分
k8s的node就是kubelet+cri(一般是docker),kubectl是一個agent讀取kubeconfig去訪問kube-apiserver來操作集群,kubeadm是部署,所以master節點需要安裝三個,node一般不需要kubectl
安裝相關軟件
yum install -y \ kubeadm-1.18.5 \ kubectl-1.18.5 \ kubelet-1.18.5 \ --disableexcludes=kubernetes && \ systemctl enable kubelet
node節點安裝軟件
yum install -y \ kubeadm-1.18.5 \ kubelet-1.18.5 \ --disableexcludes=kubernetes && \ systemctl enable kubelet
配置集群信息(第一個master上配置)
打印默認init的配置信息
kubeadm config print init-defaults > initconfig.yaml #我們看下默認init的集群參數 apiVersion: kubeadm.k8s.io/v1beta2 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 1.2.3.4 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock name: k8s-m1 taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: k8s.gcr.io kind: ClusterConfiguration kubernetesVersion: v1.16.0 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 scheduler: {}
我們主要關注和只保留ClusterConfiguration
的段,然后修改下,可以參考下列的v1beta2
文檔,如果是低版本可能是v1beta1,某些字段和新的是不一樣的,自行查找godoc看
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#hdr-Basics
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#pkg-constants
https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ClusterConfiguration
ip啥的自行更改成和自己的一致,cidr不懂咋計算就別亂改。controlPlaneEndpoint寫域名(內網沒dns所有機器寫hosts也行)或者SLB,VIP,原因和注意事項見 https://zhangguanzhang.github.io/2019/03/11/k8s-ha/ 這個文章我把HA解釋得很清楚了,不要再問我了,下面是最終的yaml
apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration imageRepository: registry.aliyuncs.com/k8sxio kubernetesVersion: v1.18.5 # 如果鏡像列出的版本不對就這里寫正確版本號 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes networking: #https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Networking dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 controlPlaneEndpoint: apiserver.k8s.local:8443 # 單個master的話寫master的ip或者不寫 apiServer: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#APIServer timeoutForControlPlane: 4m0s extraArgs: authorization-mode: "Node,RBAC" enable-admission-plugins: "NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeClaimResize,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,Priority,PodPreset" runtime-config: api/all=true,settings.k8s.io/v1alpha1=true storage-backend: etcd3 etcd-servers: https://192.168.50.101:2379,https://192.168.50.102:2379,https://192.168.50.103:2379 certSANs: - 10.96.0.1 # service cidr的第一個ip - 127.0.0.1 # 多個master的時候負載均衡出問題了能夠快速使用localhost調試 - localhost - apiserver.k8s.local # 負載均衡的域名或者vip - 192.168.50.101 - 192.168.50.102 - 192.168.50.103 - apiserver01.k8s.local - apiserver02.k8s.local - apiserver03.k8s.local - master - kubernetes - kubernetes.default - kubernetes.default.svc - kubernetes.default.svc.cluster.local extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true controllerManager: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#ControlPlaneComponent extraArgs: bind-address: "0.0.0.0" experimental-cluster-signing-duration: 867000h extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true scheduler: extraArgs: bind-address: "0.0.0.0" extraVolumes: - hostPath: /etc/localtime mountPath: /etc/localtime name: localtime readOnly: true dns: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#DNS type: CoreDNS # or kube-dns imageRepository: coredns # azk8s.cn已失效,使用dockerhub上coredns官方鏡像 imageTag: 1.6.7 # 阿里鏡像倉庫目前只有1.6.7,最新見dockerhub etcd: # https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Etcd local: imageRepository: quay.io/coreos imageTag: v3.4.7 dataDir: /var/lib/etcd serverCertSANs: # server和peer的localhost,127,::1都默認自帶的不需要寫 - master - 192.168.50.101 - 192.168.50.102 - 192.168.50.103 - etcd01.k8s.local - etcd02.k8s.local - etcd03.k8s.local peerCertSANs: - master - 192.168.50.101 - 192.168.50.102 - 192.168.50.103 - etcd01.k8s.local - etcd02.k8s.local - etcd03.k8s.local extraArgs: # 暫時沒有extraVolumes auto-compaction-retention: "1h" max-request-bytes: "33554432" quota-backend-bytes: "8589934592" enable-v2: "false" # disable etcd v2 api # external: //外部etcd的時候這樣配置 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2#Etcd # endpoints: # - "https://172.19.0.2:2379" # - "https://172.19.0.3:2379" # - "https://172.19.0.4:2379" # caFile: "/etc/kubernetes/pki/etcd/ca.crt" # certFile: "/etc/kubernetes/pki/etcd/etcd.crt" # keyFile: "/etc/kubernetes/pki/etcd/etcd.key" --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration # https://godoc.org/k8s.io/kube-proxy/config/v1alpha1#KubeProxyConfiguration mode: ipvs # or iptables ipvs: excludeCIDRs: null minSyncPeriod: 0s scheduler: "rr" # 調度算法 syncPeriod: 15s iptables: masqueradeAll: true masqueradeBit: 14 minSyncPeriod: 0s syncPeriod: 30s --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration # https://godoc.org/k8s.io/kubelet/config/v1beta1#KubeletConfiguration cgroupDriver: systemd failSwapOn: true # 如果開啟swap則設置為false
- swap的話看最后一行,apiserver的exterArgs是為了開啟
podPreset
,1.16之前且包括1.16,runtime-config
的值應該設置為api/all,settings.k8s.io/v1alpha1=true
- 單台master的話把
controlPlaneEndpoint
的值改為第一個master的ip - etcd的支持版本可以代碼里查看 https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/constants/constants.go#L422-L430
檢查文件是否錯誤,忽略warning
,錯誤的話會拋出error,沒錯則會輸出到包含字符串kubeadm join xxx
啥的
kubeadm init --config initconfig.yaml --dry-run
檢查鏡像是否正確,版本號不正確就把yaml里的kubernetesVersion
取消注釋寫上自己對應的版本號
kubeadm config images list --config initconfig.yaml
預先拉取鏡像
kubeadm config images pull --config initconfig.yaml # 下面是輸出 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-apiserver:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-controller-manager:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-scheduler:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/kube-proxy:v1.18.5 [config/images] Pulled gcr.azk8s.cn/google_containers/pause:3.1 [config/images] Pulled quay.azk8s.cn/coreos/etcd:v3.4.7 [config/images] Pulled coredns/coredns:1.6.3
七:kubeadm init
下面init只在第一個master上面操作
# --experimental-upload-certs 參數的意思為將相關的證書直接上傳到etcd中保存,這樣省去我們手動分發證書的過程 # 注意在v1.15+版本中,已經變成正式參數,不再是實驗性質,之前的版本請使用 --experimental-upload-certs kubeadm init --config initconfig.yaml --upload-certs
如果超時了看看是不是kubelet沒起來,調試見 https://github.com/zhangguanzhang/Kubernetes-ansible/wiki/systemctl-running-debug
記住init后打印的token,復制kubectl的kubeconfig,kubectl的kubeconfig路徑默認是~/.kube/config
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
init的yaml信息實際上會存在集群的configmap里,我們可以隨時查看,該yaml在其他node和master join的時候會使用到
kubectl -n kube-system get cm kubeadm-config -o yaml
如果單個master,也不想整其他的node,需要去掉master節點上的污點,下一步的多master操作不需要整
kubectl taint nodes --all node-role.kubernetes.io/master-
設置ep的rbac
kube-apiserver的web健康檢查路由有權限,我們需要開放用來監控或者對接SLB的健康檢查,yaml文件 https://github.com/zhangguanzhang/Kubernetes-ansible-base/blob/roles/master/files/healthz-rbac.yml
kubectl apply -f https://raw.githubusercontent.com/zhangguanzhang/Kubernetes-ansible-base/roles/master/files/healthz-rbac.yml
配置其他master的k8s管理組件
手動拷貝(某些低版本不支持上傳證書的時候操作,如果前面kubeadm init的時候加了上傳證書選項這步不用執行)
第一個master上拷貝ca證書到其他master節點上,因為交互輸入密碼,我們安裝sshpass,zhangguanzhang是root密碼
yum install sshpass -y alias ssh='sshpass -p zhangguanzhang ssh -o StrictHostKeyChecking=no' alias scp='sshpass -p zhangguanzhang scp -o StrictHostKeyChecking=no'
復制ca證書到其他master節點
for node in 172.19.0.3 172.19.0.4;do ssh $node 'mkdir -p /etc/kubernetes/pki/etcd' scp -r /etc/kubernetes/pki/ca.* $node:/etc/kubernetes/pki/ scp -r /etc/kubernetes/pki/sa.* $node:/etc/kubernetes/pki/ scp -r /etc/kubernetes/pki/front-proxy-ca.* $node:/etc/kubernetes/pki/ scp -r /etc/kubernetes/pki/etcd/ca.* $node:/etc/kubernetes/pki/etcd/ done
其他master join進來
kubeadm join apiserver.k8s.local:8443 --token vo6qyo.4cm47w561q9p830v \ --discovery-token-ca-cert-hash sha256:46e177c317037a4815c6deaab8089da4340663efeeead40810d4f53239256671 \ --control-plane --certificate-key ba869da2d611e5afba5f9959a5f18891c20fb56d90592225765c0b965e3d8783
token忘記的話可以kubeadm token list
查看,可以通過kubeadm token create
創建
sha256的值可以通過下列命令獲取
openssl x509 -pubkey -in \ /etc/kubernetes/pki/ca.crt | \ openssl rsa -pubin -outform der 2>/dev/null | \ openssl dgst -sha256 -hex | sed 's/^.* //'
設置kubectl的補全腳本
kubectl completion bash > /etc/bash_completion.d/kubectl
所有master配置etcdctl
復制出容器里的etcdctl
docker cp `docker ps -a | awk '/k8s_etcd/{print $1}'`:/usr/local/bin/etcdctl /usr/local/bin/etcdctl
1.13還是具體哪個版本后k8s默認使用v3 api的etcd,這里我們配置下etcdctl的參數
cat >/etc/profile.d/etcd.sh<<'EOF' ETCD_CERET_DIR=/etc/kubernetes/pki/etcd/ ETCD_CA_FILE=ca.crt ETCD_KEY_FILE=healthcheck-client.key ETCD_CERT_FILE=healthcheck-client.crt ETCD_EP=https://192.168.50.101:2379,https://192.168.50.102:2379,https://192.168.50.103:2379 alias etcd_v2="etcdctl --cert-file ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \ --key-file ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \ --ca-file ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \ --endpoints $ETCD_EP" alias etcd_v3="ETCDCTL_API=3 \ etcdctl \ --cert ${ETCD_CERET_DIR}/${ETCD_CERT_FILE} \ --key ${ETCD_CERET_DIR}/${ETCD_KEY_FILE} \ --cacert ${ETCD_CERET_DIR}/${ETCD_CA_FILE} \ --endpoints $ETCD_EP" EOF
重新ssh下或者手動加載下環境變量. /etc/profile.d/etcd.sh
[root@k8s-m1 ~]# etcd_v3 endpoint status --write-out=table +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.50.101:2379 | 9fdaf6a25119065e | 3.4.7 | 3.1 MB | false | false | 5 | 305511 | 305511 | | | https://192.168.50.102:2379 | a3d9d41cf6d05e08 | 3.4.7 | 3.1 MB | true | false | 5 | 305511 | 305511 | | | https://192.168.50.103:2379 | 3b34476e501895d4 | 3.4.7 | 3.0 MB | false | false | 5 | 305511 | 305511 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
配置etcd備份腳本
mkdir -p /opt/etcd cat>/opt/etcd/etcd_cron.sh<<'EOF' #!/bin/bash set -e export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin : ${bak_dir:=/root/} #缺省備份目錄,可以修改成存在的目錄 : ${cert_dir:=/etc/kubernetes/pki/etcd/} : ${endpoints:=https://192.168.50.101:2379,https://192.168.50.102:2379,https://192.168.50.103:2379} bak_prefix='etcd-' cmd_suffix='date +%Y-%m-%d-%H:%M' bak_suffix='.db' #將規范化后的命令行參數分配至位置參數($1,$2,...) temp=`getopt -n $0 -o c:d: -u -- "$@"` [ $? != 0 ] && { echo ' Examples: # just save once bash $0 /tmp/etcd.db # save in contab and keep 5 bash $0 -c 5 ' exit 1 } set -- $temp # -c 備份保留副本數量 # -d 指定備份存放目錄 while true;do case "$1" in -c) [ -z "$bak_count" ] && bak_count=$2 printf -v null %d "$bak_count" &>/dev/null || \ { echo 'the value of the -c must be number';exit 1; } shift 2 ;; -d) [ ! -d "$2" ] && mkdir -p $2 bak_dir=$2 shift 2 ;; *) [[ -z "$1" || "$1" == '--' ]] && { shift;break; } echo "Internal error!" exit 1 ;; esac done function etcd_v2(){ etcdctl --cert-file $cert_dir/healthcheck-client.crt \ --key-file $cert_dir/healthcheck-client.key \ --ca-file $cert_dir/ca.crt \ --endpoints $endpoints $@ } function etcd_v3(){ ETCDCTL_API=3 etcdctl \ --cert $cert_dir/healthcheck-client.crt \ --key $cert_dir/healthcheck-client.key \ --cacert $cert_dir/ca.crt \ --endpoints $endpoints $@ } etcd::cron::save(){ cd $bak_dir/ etcd_v3 snapshot save $bak_prefix$($cmd_suffix)$bak_suffix rm_files=`ls -t $bak_prefix*$bak_suffix | tail -n +$[bak_count+1]` if [ -n "$rm_files" ];then rm -f $rm_files fi } main(){ [ -n "$bak_count" ] && etcd::cron::save || etcd_v3 snapshot save $@ } main $@ EOF
crontab -e添加下面內容自動保留四個備份副本
bash /opt/etcd/etcd_cron.sh -c 4 -d /opt/etcd/ &>/dev/null
node
按照前面的做:
- 配置系統設置
- 設置hostname
- 安裝docker-ce
- 設置hosts和nginx
- 配置軟件源,安裝kubeadm kubelet
和master的join一樣,提前准備好環境和docker,然后join的時候不需要帶--control-plane
,只有一個master的話join的那個ip寫controlPlaneEndpoint
的值
kubeadm join apiserver.k8s.local:8443 --token vo6qyo.4cm47w561q9p830v \ --discovery-token-ca-cert-hash sha256:46e177c317037a4815c6deaab8089da4340663efeeead40810d4f53239256671
[root@k8s-m1 ~]# kubectl get node NAME STATUS ROLES AGE VERSION k8s-m1 Ready master 23h v1.18.5 k8s-m2 Ready master 23h v1.18.5 k8s-m3 Ready master 23h v1.18.5 k8s-node1 Ready node 23h v1.18.5 k8s-node2 Ready node 121m v1.18.5 k8s-node3 Ready node 82m v1.18.5
addon(此章開始到結尾選取任意一個master上執行)
容器的網絡還沒處理好,coredns無法分配到ip會處於pending
狀態,這里我用flannel部署,如果你了解bgp可以使用calico
yaml文件來源與flannel官方github https://github.com/coreos/flannel/tree/master/Documentation
修改
-
如果是在1.16之前使用psp,
policy/v1beta1
得修改成extensions/v1beta1;這里不用修改
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
- rbac的version改為下面,不要使用v1beta1
了,使用下面命令修改
sed -ri '/apiVersion: rbac/s#v1.+#v1#' kube-flannel.yml
- 官方yaml自帶了四種架構的daemonset,我們刪掉除了amd64以外的,大概是227行到結尾
sed -ri '227,$d' kube-flannel.yml
- pod的cidr修改了的話這里也要修改,如果是在同一個二層,可以使用把vxlan
改為性能更強的host-gw
模式,vxlan的話需要安全組放開8472端口的udp
net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }
- 修改limits,需要大於request
limits: cpu: "200m" memory: "100Mi"
部署flannel
貌似沒有遇到這個錯誤
1.15后node的cidr是數組,而不是單個了,flannel目前0.11和之前版本部署的話會有下列錯誤,見文檔
https://github.com/kubernetes/kubernetes/blob/v1.15.0/staging/src/k8s.io/api/core/v1/types.go#L3890-L3893
https://github.com/kubernetes/kubernetes/blob/v1.18.2/staging/src/k8s.io/api/core/v1/types.go#L4206-L4216
Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned
手動打patch,后續擴的node也記得打下
nodes=`kubectl get node --no-headers | awk '{print $1}'` for node in $nodes;do cidr=`kubectl get node "$node" -o jsonpath='{.spec.podCIDRs[0]}'` [ -z "$(kubectl get node $node -o jsonpath='{.spec.podCIDR}')" ] && { kubectl patch node "$node" -p '{"spec":{"podCIDR":"'"$cidr"'"}}' } done
最終的kube-flannel.yml如下:
[root@k8s-m1 ~]# cat kube-flannel.yml --- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: psp.flannel.unprivileged annotations: seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default spec: privileged: false volumes: - configMap - secret - emptyDir - hostPath allowedHostPaths: - pathPrefix: "/etc/cni/net.d" - pathPrefix: "/etc/kube-flannel" - pathPrefix: "/run/flannel" readOnlyRootFilesystem: false # Users and groups runAsUser: rule: RunAsAny supplementalGroups: rule: RunAsAny fsGroup: rule: RunAsAny # Privilege Escalation allowPrivilegeEscalation: false defaultAllowPrivilegeEscalation: false # Capabilities allowedCapabilities: ['NET_ADMIN'] defaultAddCapabilities: [] requiredDropCapabilities: [] # Host namespaces hostPID: false hostIPC: false hostNetwork: true hostPorts: - min: 0 max: 65535 # SELinux seLinux: # SELinux is unused in CaaSP rule: 'RunAsAny' --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel rules: - apiGroups: ['extensions'] resources: ['podsecuritypolicies'] verbs: ['use'] resourceNames: ['psp.flannel.unprivileged'] - apiGroups: - "" resources: - pods verbs: - get - apiGroups: - "" resources: - nodes verbs: - list - watch - apiGroups: - "" resources: - nodes/status verbs: - patch --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-system --- apiVersion: v1 kind: ServiceAccount metadata: name: flannel namespace: kube-system --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name": "cbr0", "cniVersion": "0.3.1", "plugins": [ { "type": "flannel", "delegate": { "hairpinMode": true, "isDefaultGateway": true } }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "host-gw" } } --- apiVersion: apps/v1 kind: DaemonSet metadata: name: kube-flannel-ds-amd64 namespace: kube-system labels: tier: node app: flannel spec: selector: matchLabels: app: flannel template: metadata: labels: tier: node app: flannel spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux - key: kubernetes.io/arch operator: In values: - amd64 hostNetwork: true tolerations: - operator: Exists effect: NoSchedule serviceAccountName: flannel initContainers: - name: install-cni image: quay.io/coreos/flannel:v0.12.0-amd64 command: - cp args: - -f - /etc/kube-flannel/cni-conf.json - /etc/cni/net.d/10-flannel.conflist volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.12.0-amd64 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr resources: requests: cpu: "100m" memory: "50Mi" limits: cpu: "200m" memory: "100Mi" securityContext: privileged: false capabilities: add: ["NET_ADMIN"] env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: run mountPath: /run/flannel - name: flannel-cfg mountPath: /etc/kube-flannel/ volumes: - name: run hostPath: path: /run/flannel - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg
這里采用了host-gw模式,因為遇到了udp的內核bug,詳細請參考:https://zhangguanzhang.github.io/2020/05/23/k8s-vxlan-63-timeout/
kubectl apply -f kube-flannel.yml
驗證集群可用性
kubectl -n kube-system get pod -o wide
等待kube-system空間下的pod都是running后我們來測試下集群可用性
cat<<EOF | kubectl apply -f - apiVersion: apps/v1 kind: Deployment metadata: name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx:alpine name: nginx ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 --- apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - name: busybox image: zhangguanzhang/centos command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always EOF
等待pod running
驗證集群dns
$ kubectl exec -ti busybox -- nslookup kubernetes Server: 10.96.0.10 Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local Name: kubernetes Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
關於kubeadm過程和更多詳細參數選項見下面文章
新添加節點
1. 初始化centos 7
初始化腳本
#!/bin/bash #----配置時間統一性---- echo "配置時間" yum install chrony -y mv /etc/chrony.conf /etc/chrony.conf.bak cat>/etc/chrony.conf<<EOF server ntp.aliyun.com iburst stratumweight 0 driftfile /var/lib/chrony/drift rtcsync makestep 10 3 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 keyfile /etc/chrony.keys commandkey 1 generatecommandkey logchange 0.5 logdir /var/log/chrony EOF /usr/bin/systemctl enable chronyd /usr/bin/systemctl restart chronyd #---關閉交換分區--- echo "關閉交換分區" swapoff -a && sysctl -w vm.swappiness=0 sed -ri '/^[^#]*swap/s@^@#@' /etc/fstab #---關閉防火牆以及selinux--- echo "關閉防火牆以及selinux" systemctl stop firewalld systemctl disable firewalld setenforce 0 sed -ri '/^[^#]*SELINUX=/s#=.+$#=disabled#' /etc/selinux/config #---關閉NetworkManager--- echo "關閉NetworkManager" systemctl disable NetworkManager systemctl stop NetworkManager #---安裝epel源,並且替換為阿里雲的epel源--- yum install epel-release wget -y wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo #---安裝依賴組件--- echo "安裝依賴組件" yum install -y \ curl \ git \ conntrack-tools \ psmisc \ nfs-utils \ jq \ socat \ bash-completion \ ipset \ ipvsadm \ conntrack \ libseccomp \ net-tools \ crontabs \ sysstat \ unzip \ iftop \ nload \ strace \ bind-utils \ tcpdump \ telnet \ lsof \ htop #---ipvs模式需要開機加載下列模塊--- echo "ipvs模式需要開機加載下列模塊" cat>/etc/modules-load.d/ipvs.conf<<EOF ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack br_netfilter EOF systemctl daemon-reload systemctl enable --now systemd-modules-load.service #---設定系統參數--- cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 net.ipv4.neigh.default.gc_stale_time = 120 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.default.arp_announce = 2 net.ipv4.conf.lo.arp_announce = 2 net.ipv4.conf.all.arp_announce = 2 net.ipv4.ip_forward = 1 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 1024 net.ipv4.tcp_synack_retries = 2 # 要求iptables不對bridge的數據進行處理 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-arptables = 1 net.netfilter.nf_conntrack_max = 2310720 fs.inotify.max_user_watches=89100 fs.may_detach_mounts = 1 fs.file-max = 52706963 fs.nr_open = 52706963 vm.overcommit_memory=1 vm.panic_on_oom=0 # https://github.com/moby/moby/issues/31208 # ipvsadm -l --timout # 修復ipvs模式下長連接timeout問題 小於900即可 net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 10 EOF sysctl --system #---優化設置 journal 日志相關--- sed -ri 's/^\$ModLoad imjournal/#&/' /etc/rsyslog.conf sed -ri 's/^\$IMJournalStateFile/#&/' /etc/rsyslog.conf sed -ri 's/^#(DefaultLimitCORE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(DefaultLimitNOFILE)=/\1=100000/' /etc/systemd/system.conf sed -ri 's/^#(UseDNS )yes/\1no/' /etc/ssh/sshd_config #---優化文件最大打開數--- cat>/etc/security/limits.d/kubernetes.conf<<EOF * soft nproc 131072 * hard nproc 131072 * soft nofile 131072 * hard nofile 131072 root soft nproc 131072 root hard nproc 131072 root soft nofile 131072 root hard nofile 131072 EOF #---設置user_namespace.enable=1--- grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
2. 編譯安裝nginx
yum install gcc gcc-c++ -y tar zxvf nginx-1.16.1.tar.gz cd nginx-1.16.1/ ./configure --with-stream --without-http --prefix=/usr/local/kube-nginx --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module make && make install groupadd nginx useradd -r -g nginx nginx systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx
3. 重新生成tocken
kubeadm token create openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' kubeadm join apiserver.k8s.local:8443 --token 8ceduc.cy0r23j2hpsw80ff --discovery-token-ca-cert-hash sha256:46e177c317037a4815c6deaab8089da4340663efeeead40810d4f53239256671
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "vo6qyo"
此時就需要重新生成tocken。