00.組件版本和配置策略
組件版本
- Kubernetes 1.14.2
- Docker 18.09.6-ce
- Etcd 3.3.13
- Flanneld 0.11.0
- 插件:
- Coredns
- Dashboard
- Metrics-server
- EFK (elasticsearch、fluentd、kibana)
- 鏡像倉庫:
- docker registry
- harbor
主要配置策略
kube-apiserver:
- 使用節點本地 nginx 4 層透明代理實現高可用;
- 關閉非安全端口 8080 和匿名訪問;
- 在安全端口 6443 接收 https 請求;
- 嚴格的認證和授權策略 (x509、token、RBAC);
- 開啟 bootstrap token 認證,支持 kubelet TLS bootstrapping;
- 使用 https 訪問 kubelet、etcd,加密通信;
kube-controller-manager:
- 3 節點高可用;
- 關閉非安全端口,在安全端口 10252 接收 https 請求;
- 使用 kubeconfig 訪問 apiserver 的安全端口;
- 自動 approve kubelet 證書簽名請求 (CSR),證書過期后自動輪轉;
- 各 controller 使用自己的 ServiceAccount 訪問 apiserver;
kube-scheduler:
- 3 節點高可用;
- 使用 kubeconfig 訪問 apiserver 的安全端口;
kubelet:
- 使用 kubeadm 動態創建 bootstrap token,而不是在 apiserver 中靜態配置;
- 使用 TLS bootstrap 機制自動生成 client 和 server 證書,過期后自動輪轉;
- 在 KubeletConfiguration 類型的 JSON 文件配置主要參數;
- 關閉只讀端口,在安全端口 10250 接收 https 請求,對請求進行認證和授權,拒絕匿名訪問和非授權訪問;
- 使用 kubeconfig 訪問 apiserver 的安全端口;
kube-proxy:
- 使用 kubeconfig 訪問 apiserver 的安全端口;
- 在 KubeProxyConfiguration 類型的 JSON 文件配置主要參數;
- 使用 ipvs 代理模式;
集群插件:
- DNS:使用功能、性能更好的 coredns;
- Dashboard:支持登錄認證;
- Metric:metrics-server,使用 https 訪問 kubelet 安全端口;
- Log:Elasticsearch、Fluend、Kibana;
- Registry 鏡像庫:docker-registry、harbor;
01.系統初始化和全局變量
集群機器
- kube-node1:192.168.75.110
- kube-node2:192.168.75.111
- kube-node3:192.168.75.112
注意:
- 本文檔中的 etcd 集群、master 節點、worker 節點均使用這三台機器;
- 需要在所有機器上執行本文檔的初始化命令;
- 需要使用 root 賬號執行這些命令;
- 如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令;
主機名
設置永久主機名稱,然后重新登錄:
$ sudo hostnamectl set-hostname kube-node1 # 將 kube-node1 替換為當前主機名
- 設置的主機名保存在
/etc/hostname
文件中;
如果 DNS 不支持解析主機名稱,則需要修改每台機器的 /etc/hosts
文件,添加主機名和 IP 的對應關系:
cat >> /etc/hosts <<EOF
192.168.75.110 kube-node1
192.168.75.111 kube-node2
192.168.75.112 kube-node3
EOF
添加 docker 賬戶
在每台機器上添加 docker 賬戶:
useradd -m docker
無密碼 ssh 登錄其它節點
如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
設置 kube-node1 的 root 賬戶可以無密碼登錄所有節點:
ssh-keygen -t rsa
ssh-copy-id root@kube-node1
ssh-copy-id root@kube-node2
ssh-copy-id root@kube-node3
更新 PATH 變量
將可執行文件目錄添加到 PATH 環境變量中:
mkdir -p /opt/k8s/bin
echo 'PATH=/opt/k8s/bin:$PATH' >>/root/.bashrc
source /root/.bashrc
安裝依賴包
在每台機器上安裝依賴包:
CentOS:
yum install -y epel-release
yum install -y conntrack ntpdate ntp ipvsadm ipset jq iptables curl sysstat libseccomp wget
Ubuntu:
apt-get install -y conntrack ipvsadm ntp ipset jq iptables curl sysstat libseccomp
- ipvs 依賴 ipset;
- ntp 保證各機器系統時間同步;
關閉防火牆
在每台機器上關閉防火牆,清理防火牆規則,設置默認轉發策略:
systemctl stop firewalld
systemctl disable firewalld
iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
iptables -P FORWARD ACCEPT
關閉 swap 分區
如果開啟了 swap 分區,kubelet 會啟動失敗(可以通過將參數 --fail-swap-on 設置為 false 來忽略 swap on),故需要在每台機器上關閉 swap 分區。同時注釋 /etc/fstab
中相應的條目,防止開機自動掛載 swap 分區:
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
關閉 SELinux
關閉 SELinux,否則后續 K8S 掛載目錄時可能報錯 Permission denied
:
setenforce 0
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
關閉 dnsmasq(可選)
linux 系統開啟了 dnsmasq 后(如 GUI 環境),將系統 DNS Server 設置為 127.0.0.1,這會導致 docker 容器無法解析域名,需要關閉它:
systemctl stop dnsmasq
systemctl disable dnsmasq
加載內核模塊
modprobe ip_vs_rr
modprobe br_netfilter
優化內核參數
cat > kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
net.ipv4.tcp_tw_recycle=0
# 禁止使用 swap 空間,只有當系統 OOM 時才允許使用它
vm.swappiness=0
# 不檢查物理內存是否夠用
vm.overcommit_memory=1
# 開啟 OOM
vm.panic_on_oom=0
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720
vm.max_map_count=655360
EOF
cp kubernetes.conf /etc/sysctl.d/kubernetes.conf
sysctl -p /etc/sysctl.d/kubernetes.conf
- 必須關閉 tcp_tw_recycle,否則和 NAT 沖突,會導致服務不通;
- 關閉 IPV6,防止觸發 docker BUG;
設置系統時區
# 調整系統 TimeZone
timedatectl set-timezone Asia/Shanghai
# 將當前的 UTC 時間寫入硬件時鍾
timedatectl set-local-rtc 0
# 重啟依賴於系統時間的服務
systemctl restart rsyslog
systemctl restart crond
關閉無關的服務
systemctl stop postfix && systemctl disable postfix
設置 rsyslogd 和 systemd journald
systemd 的 journald 是 Centos 7 缺省的日志記錄工具,它記錄了所有系統、內核、Service Unit 的日志。
相比 systemd,journald 記錄的日志有如下優勢:
- 可以記錄到內存或文件系統;(默認記錄到內存,對應的位置為 /run/log/jounal);
- 可以限制占用的磁盤空間、保證磁盤剩余空間;
- 可以限制日志文件大小、保存的時間;
journald 默認將日志轉發給 rsyslog,這會導致日志寫了多份,/var/log/messages 中包含了太多無關日志,不方便后續查看,同時也影響系統性能。
mkdir /var/log/journal # 持久化保存日志的目錄
mkdir /etc/systemd/journald.conf.d
cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
[Journal]
# 持久化保存到磁盤
Storage=persistent
# 壓縮歷史日志
Compress=yes
SyncIntervalSec=5m
RateLimitInterval=30s
RateLimitBurst=1000
# 最大占用空間 10G
SystemMaxUse=10G
# 單日志文件最大 200M
SystemMaxFileSize=200M
# 日志保存時間 2 周
MaxRetentionSec=2week
# 不將日志轉發到 syslog
ForwardToSyslog=no
EOF
systemctl restart systemd-journald
創建目錄
創建目錄:
mkdir -p /opt/k8s/{bin,work} /etc/{kubernetes,etcd}/cert
升級內核
CentOS 7.x 系統自帶的 3.10.x 內核存在一些 Bugs,導致運行的 Docker、Kubernetes 不穩定,例如:
- 高版本的 docker(1.13 以后) 啟用了 3.10 kernel 實驗支持的 kernel memory account 功能(無法關閉),當節點壓力大如頻繁啟動和停止容器時會導致 cgroup memory leak;
- 網絡設備引用計數泄漏,會導致類似於報錯:"kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1";
解決方案如下:
- 升級內核到 4.4.X 以上;
- 或者,手動編譯內核,disable CONFIG_MEMCG_KMEM 特性;
- 或者,安裝修復了該問題的 Docker 18.09.1 及以上的版本。但由於 kubelet 也會設置 kmem(它 vendor 了 runc),所以需要重新編譯 kubelet 並指定 GOFLAGS="-tags=nokmem";
git clone --branch v1.14.1 --single-branch --depth 1 https://github.com/kubernetes/kubernetes
cd kubernetes
KUBE_GIT_VERSION=v1.14.1 ./build/run.sh make kubelet GOFLAGS="-tags=nokmem"
這里采用升級內核的解決辦法:
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
# 安裝完成后檢查 /boot/grub2/grub.cfg 中對應內核 menuentry 中是否包含 initrd16 配置,如果沒有,再安裝一次!
yum --enablerepo=elrepo-kernel install -y kernel-lt
# 設置開機從新內核啟動
grub2-set-default 0
安裝內核源文件(可選,在升級完內核並重啟機器后執行):
# yum erase kernel-headers
yum --enablerepo=elrepo-kernel install kernel-lt-devel-$(uname -r) kernel-lt-headers-$(uname -r)
關閉 NUMA
cp /etc/default/grub{,.bak}
vim /etc/default/grub # 在 GRUB_CMDLINE_LINUX 一行添加 `numa=off` 參數,如下所示:
diff /etc/default/grub.bak /etc/default/grub
6c6
< GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet"
---
> GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet numa=off"
重新生成 grub2 配置文件:
cp /boot/grub2/grub.cfg{,.bak}
grub2-mkconfig -o /boot/grub2/grub.cfg
分發集群環境變量定義腳本
后續使用的環境變量都定義在文件 environment.sh 中,請根據自己的機器、網絡情況修改。然后,把它拷貝到所有節點的 /opt/k8s/bin
目錄:
#!/usr/bin/bash
# 生成 EncryptionConfig 所需的加密 key
export ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)
# 集群各機器 IP 數組
export NODE_IPS=(192.168.75.110 192.168.75.111 192.168.75.112)
# 集群各 IP 對應的主機名數組
export NODE_NAMES=(kube-node1 kube-node2 kube-node3)
# etcd 集群服務地址列表
export ETCD_ENDPOINTS="https://192.168.75.110:2379,https://192.168.75.111:2379,https://192.168.75.112:2379"
# etcd 集群間通信的 IP 和端口
export ETCD_NODES="kube-node1=https://192.168.75.110:2380,kube-node2=https://192.168.75.111:2380,kube-node3=https://192.168.75.112:2380"
# kube-apiserver 的反向代理(kube-nginx)地址端口
export KUBE_APISERVER="https://127.0.0.1:8443"
# 節點間互聯網絡接口名稱
export VIP_IF="ens33"
# etcd 數據目錄
export ETCD_DATA_DIR="/data/k8s/etcd/data"
# etcd WAL 目錄,建議是 SSD 磁盤分區,或者和 ETCD_DATA_DIR 不同的磁盤分區
export ETCD_WAL_DIR="/data/k8s/etcd/wal"
# k8s 各組件數據目錄
export K8S_DIR="/data/k8s/k8s"
# docker 數據目錄
export DOCKER_DIR="/data/k8s/docker"
## 以下參數一般不需要修改
# TLS Bootstrapping 使用的 Token,可以使用命令 head -c 16 /dev/urandom | od -An -t x | tr -d ' ' 生成
BOOTSTRAP_TOKEN="41f7e4ba8b7be874fcff18bf5cf41a7c"
# 最好使用 當前未用的網段 來定義服務網段和 Pod 網段
# 服務網段,部署前路由不可達,部署后集群內路由可達(kube-proxy 保證)
SERVICE_CIDR="10.254.0.0/16"
# Pod 網段,建議 /16 段地址,部署前路由不可達,部署后集群內路由可達(flanneld 保證)
CLUSTER_CIDR="172.30.0.0/16"
# 服務端口范圍 (NodePort Range)
export NODE_PORT_RANGE="30000-32767"
# flanneld 網絡配置前綴
export FLANNEL_ETCD_PREFIX="/kubernetes/network"
# kubernetes 服務 IP (一般是 SERVICE_CIDR 中第一個IP)
export CLUSTER_KUBERNETES_SVC_IP="10.254.0.1"
# 集群 DNS 服務 IP (從 SERVICE_CIDR 中預分配)
export CLUSTER_DNS_SVC_IP="10.254.0.2"
# 集群 DNS 域名(末尾不帶點號)
export CLUSTER_DNS_DOMAIN="cluster.local"
# 將二進制目錄 /opt/k8s/bin 加到 PATH 中
export PATH=/opt/k8s/bin:$PATH
source environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp environment.sh root@${node_ip}:/opt/k8s/bin/
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
# 這個腳本使用的是用戶名@ip的形式,結果還是需要輸入響應的用戶密碼才行,考慮上前面步驟配置的無密碼ssh登陸其他節點,這里可以考慮改換成hostname的形式進行
source environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
scp environment.sh root@${node_name}:/opt/k8s/bin/
ssh root@${node_name} "chmod +x /opt/k8s/bin/*"
done
-
參考
- 系統內核相關參數參考:https://docs.openshift.com/enterprise/3.2/admin_guide/overcommit.html
- 3.10.x 內核 kmem bugs 相關的討論和解決辦法:
02.創建 CA 證書和秘鑰
為確保安全,kubernetes
系統各組件需要使用 x509
證書對通信進行加密和認證。
CA (Certificate Authority) 是自簽名的根證書,用來簽名后續創建的其它證書。
本文檔使用 CloudFlare
的 PKI 工具集 cfssl 創建所有證書。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1節點上執行,然后遠程分發文件和執行命令。
安裝 cfssl 工具集
mkdir -p /opt/k8s/cert && cd /opt/k8s
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
mv cfssl_linux-amd64 /opt/k8s/bin/cfssl
wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
mv cfssljson_linux-amd64 /opt/k8s/bin/cfssljson
wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 /opt/k8s/bin/cfssl-certinfo
chmod +x /opt/k8s/bin/*
export PATH=/opt/k8s/bin:$PATH
創建根證書 (CA)
CA 證書是集群所有節點共享的,只需要創建一個 CA 證書,后續創建的所有證書都由它簽名。
創建配置文件
CA 配置文件用於配置根證書的使用場景 (profile) 和具體參數 (usage,過期時間、服務端認證、客戶端認證、加密等),后續在簽名其它證書時需要指定特定場景。
cd /opt/k8s/work
cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF
signing
:表示該證書可用於簽名其它證書,生成的ca.pem
證書中CA=TRUE
;server auth
:表示 client 可以用該該證書對 server 提供的證書進行驗證;client auth
:表示 server 可以用該該證書對 client 提供的證書進行驗證;
創建證書簽名請求文件
cd /opt/k8s/work
cat > ca-csr.json <<EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
],
"ca": {
"expiry": "876000h"
}
}
EOF
- CN:
Common Name
,kube-apiserver 從證書中提取該字段作為請求的用戶名 (User Name),瀏覽器使用該字段驗證網站是否合法; - O:
Organization
,kube-apiserver 從證書中提取該字段作為請求用戶所屬的組 (Group); - kube-apiserver 將提取的 User、Group 作為
RBAC
授權的用戶標識;
生成 CA 證書和私鑰
cd /opt/k8s/work
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
ls ca*
分發證書文件
將生成的 CA 證書、秘鑰文件、配置文件拷貝到所有節點的 /etc/kubernetes/cert
目錄下:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert"
scp ca*.pem ca-config.json root@${node_ip}:/etc/kubernetes/cert
done
# 這個腳本使用的是用戶名@ip的形式,結果還是需要輸入響應的用戶密碼才行,考慮上前面步驟配置的無密碼ssh登陸其他節點,這里可以考慮改換成hostname的形式進行
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
ssh root@${node_name} "mkdir -p /etc/kubernetes/cert"
scp ca*.pem ca-config.json root@${node_name}:/etc/kubernetes/cert
done
參考
1. 各種 CA 證書類型:https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md
03.部署 kubectl 命令行工具
本文檔介紹安裝和配置 kubernetes 集群的命令行管理工具 kubectl 的步驟。
kubectl 默認從 ~/.kube/config
文件讀取 kube-apiserver 地址和認證信息,如果沒有配置,執行 kubectl 命令時可能會出錯:
$ kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?
注意:
- 如果沒有特殊指明,本文檔的所有操作均在 kube-node1節點上執行,然后遠程分發文件和執行命令;
- 本文檔只需要部署一次,生成的 kubeconfig 文件是通用的,可以拷貝到需要執行 kubectl 命令的機器,重命名為
~/.kube/config
;
下載和分發 kubectl 二進制文件
下載和解壓:
cd /opt/k8s/work
# 使用迅雷下載后上傳
wget https://dl.k8s.io/v1.14.2/kubernetes-client-linux-amd64.tar.gz
tar -xzvf kubernetes-client-linux-amd64.tar.gz
分發到所有使用 kubectl 的節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubernetes/client/bin/kubectl root@${node_ip}:/opt/k8s/bin/
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
# 使用主機名的腳本
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
scp kubernetes/client/bin/kubectl root@${node_name}:/opt/k8s/bin/
ssh root@${node_name} "chmod +x /opt/k8s/bin/*"
done
創建 admin 證書和私鑰
kubectl 與 apiserver https 安全端口通信,apiserver 對提供的證書進行認證和授權。
kubectl 作為集群的管理工具,需要被授予最高權限,這里創建具有最高權限的 admin 證書。
創建證書簽名請求:
cd /opt/k8s/work
cat > admin-csr.json <<EOF
{
"CN": "admin",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:masters",
"OU": "4Paradigm"
}
]
}
EOF
- O 為
system:masters
,kube-apiserver 收到該證書后將請求的 Group 設置為 system:masters; - 預定義的 ClusterRoleBinding
cluster-admin
將 Groupsystem:masters
與 Rolecluster-admin
綁定,該 Role 授予所有 API的權限; - 該證書只會被 kubectl 當做 client 證書使用,所以 hosts 字段為空;
生成證書和私鑰:
cd /opt/k8s/work
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes admin-csr.json | cfssljson -bare admin
ls admin*
創建 kubeconfig 文件
kubeconfig 為 kubectl 的配置文件,包含訪問 apiserver 的所有信息,如 apiserver 地址、CA 證書和自身使用的證書;
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
# 設置集群參數
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kubectl.kubeconfig
# 設置客戶端認證參數
kubectl config set-credentials admin \
--client-certificate=/opt/k8s/work/admin.pem \
--client-key=/opt/k8s/work/admin-key.pem \
--embed-certs=true \
--kubeconfig=kubectl.kubeconfig
# 設置上下文參數
kubectl config set-context kubernetes \
--cluster=kubernetes \
--user=admin \
--kubeconfig=kubectl.kubeconfig
# 設置默認上下文
kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig
--certificate-authority
:驗證 kube-apiserver 證書的根證書;--client-certificate
、--client-key
:剛生成的admin
證書和私鑰,連接 kube-apiserver 時使用;--embed-certs=true
:將 ca.pem 和 admin.pem 證書內容嵌入到生成的 kubectl.kubeconfig 文件中(不加時,寫入的是證書文件路徑,后續拷貝 kubeconfig 到其它機器時,還需要單獨拷貝證書文件,不方便。);
分發 kubeconfig 文件
分發到所有使用 kubectl
命令的節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ~/.kube"
scp kubectl.kubeconfig root@${node_ip}:~/.kube/config
done
# 使用主機名的腳本
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
ssh root@${node_name} "mkdir -p ~/.kube"
scp kubectl.kubeconfig root@${node_name}:~/.kube/config
done
- 保存的文件名為
~/.kube/config
;
04.部署 etcd 集群
etcd 是基於 Raft 的分布式 key-value 存儲系統,由 CoreOS 開發,常用於服務發現、共享配置以及並發控制(如 leader 選舉、分布式鎖等)。kubernetes 使用 etcd 存儲所有運行數據。
本文檔介紹部署一個三節點高可用 etcd 集群的步驟:
- 下載和分發 etcd 二進制文件;
- 創建 etcd 集群各節點的 x509 證書,用於加密客戶端(如 etcdctl) 與 etcd 集群、etcd 集群之間的數據流;
- 創建 etcd 的 systemd unit 文件,配置服務參數;
- 檢查集群工作狀態;
etcd 集群各節點的IP和 名稱 如下:
- 192.168.75.110 kube-node1
- 192.168.75.111 kube-node2
- 192.168.75.112 kube-node3
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
下載和分發 etcd 二進制文件
到 etcd 的 release 頁面 下載最新版本的發布包:
cd /opt/k8s/work
wget https://github.com/coreos/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz
tar -xvf etcd-v3.3.13-linux-amd64.tar.gz
分發二進制文件到集群所有節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp etcd-v3.3.13-linux-amd64/etcd* root@${node_ip}:/opt/k8s/bin
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
# 使用主機名的腳本
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
scp etcd-v3.3.13-linux-amd64/etcd* root@${node_name}:/opt/k8s/bin
ssh root@${node_name} "chmod +x /opt/k8s/bin/*"
done
創建 etcd 證書和私鑰
創建證書簽名請求:
cd /opt/k8s/work
cat > etcd-csr.json <<EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"192.168.75.110",
"192.168.75.111",
"192.168.75.112"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
- hosts 字段指定授權使用該證書的 etcd 節點 IP 或域名列表,需要將 etcd 集群的三個節點 IP 都列在其中;
生成證書和私鑰:
cd /opt/k8s/work
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes etcd-csr.json | cfssljson -bare etcd
ls etcd*pem
分發生成的證書和私鑰到各 etcd 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/etcd/cert"
scp etcd*.pem root@${node_ip}:/etc/etcd/cert/
done
# 使用主機名腳本
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
ssh root@${node_name} "mkdir -p /etc/etcd/cert"
scp etcd*.pem root@${node_name}:/etc/etcd/cert/
done
創建 etcd 的 systemd unit 模板文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > etcd.service.template <<EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=${ETCD_DATA_DIR}
ExecStart=/opt/k8s/bin/etcd \\
--data-dir=${ETCD_DATA_DIR} \\
--wal-dir=${ETCD_WAL_DIR} \\
--name=##NODE_NAME## \\
--cert-file=/etc/etcd/cert/etcd.pem \\
--key-file=/etc/etcd/cert/etcd-key.pem \\
--trusted-ca-file=/etc/kubernetes/cert/ca.pem \\
--peer-cert-file=/etc/etcd/cert/etcd.pem \\
--peer-key-file=/etc/etcd/cert/etcd-key.pem \\
--peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem \\
--peer-client-cert-auth \\
--client-cert-auth \\
--listen-peer-urls=https://##NODE_IP##:2380 \\
--initial-advertise-peer-urls=https://##NODE_IP##:2380 \\
--listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \\
--advertise-client-urls=https://##NODE_IP##:2379 \\
--initial-cluster-token=etcd-cluster-0 \\
--initial-cluster=${ETCD_NODES} \\
--initial-cluster-state=new \\
--auto-compaction-mode=periodic \\
--auto-compaction-retention=1 \\
--max-request-bytes=33554432 \\
--quota-backend-bytes=6442450944 \\
--heartbeat-interval=250 \\
--election-timeout=2000
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
WorkingDirectory
、--data-dir
:指定工作目錄和數據目錄為${ETCD_DATA_DIR}
,需在啟動服務前創建這個目錄;--wal-dir
:指定 wal 目錄,為了提高性能,一般使用 SSD 或者和--data-dir
不同的磁盤;--name
:指定節點名稱,當--initial-cluster-state
值為new
時,--name
的參數值必須位於--initial-cluster
列表中;--cert-file
、--key-file
:etcd server 與 client 通信時使用的證書和私鑰;--trusted-ca-file
:簽名 client 證書的 CA 證書,用於驗證 client 證書;--peer-cert-file
、--peer-key-file
:etcd 與 peer 通信使用的證書和私鑰;--peer-trusted-ca-file
:簽名 peer 證書的 CA 證書,用於驗證 peer 證書;
為各節點創建和分發 etcd systemd unit 文件
替換模板文件中的變量,為各節點創建 systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" etcd.service.template > etcd-${NODE_IPS[i]}.service
done
ls *.service
- NODE_NAMES 和 NODE_IPS 為相同長度的 bash 數組,分別為節點名稱和對應的 IP;
分發生成的 systemd unit 文件:
cd /opt/k8s/work
# 因為生成的etcd.service文件中是以ip進行區分的,這里不能使用主機名的形式
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service
done
- 文件重命名為 etcd.service;
啟動 etcd 服務
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd " &
done
- 必須先創建 etcd 數據目錄和工作目錄;
- etcd 進程首次啟動時會等待其它節點的 etcd 加入集群,命令
systemctl start etcd
會卡住一段時間,為正常現象;
檢查啟動結果
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status etcd|grep Active"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u etcd
驗證服務狀態
部署完 etcd 集群后,在任一 etcd 節點上執行如下命令:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ETCDCTL_API=3 /opt/k8s/bin/etcdctl \
--endpoints=https://${node_ip}:2379 \
--cacert=/etc/kubernetes/cert/ca.pem \
--cert=/etc/etcd/cert/etcd.pem \
--key=/etc/etcd/cert/etcd-key.pem endpoint health
done
預期輸出:
>>> 192.168.75.110
https://192.168.75.110:2379 is healthy: successfully committed proposal: took = 69.349466ms
>>> 192.168.75.111
https://192.168.75.111:2379 is healthy: successfully committed proposal: took = 2.989018ms
>>> 192.168.75.112
https://192.168.75.112:2379 is healthy: successfully committed proposal: took = 1.926582ms
輸出均為 healthy
時表示集群服務正常。
查看當前的 leader
source /opt/k8s/bin/environment.sh
ETCDCTL_API=3 /opt/k8s/bin/etcdctl \
-w table --cacert=/etc/kubernetes/cert/ca.pem \
--cert=/etc/etcd/cert/etcd.pem \
--key=/etc/etcd/cert/etcd-key.pem \
--endpoints=${ETCD_ENDPOINTS} endpoint status
輸出:
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://192.168.75.110:2379 | f3373394e2909c16 | 3.3.13 | 20 kB | true | 2 | 8 |
| https://192.168.75.111:2379 | bd1095e88a91da45 | 3.3.13 | 20 kB | false | 2 | 8 |
| https://192.168.75.112:2379 | 110570bfaa8447c2 | 3.3.13 | 20 kB | false | 2 | 8 |
+-----------------------------+------------------+---------+---------+-----------+-----------+------------+
- 可見,當前的 leader 為 192.168.75.110。
05.部署 flannel 網絡
kubernetes 要求集群內各節點(包括 master 節點)能通過 Pod 網段互聯互通。flannel 使用 vxlan 技術為各節點創建一個可以互通的 Pod 網絡,使用的端口為 UDP 8472(需要開放該端口,如公有雲 AWS 等)。
flanneld 第一次啟動時,從 etcd 獲取配置的 Pod 網段信息,為本節點分配一個未使用的地址段,然后創建 flannedl.1
網絡接口(也可能是其它名稱,如 flannel1 等)。
flannel 將分配給自己的 Pod 網段信息寫入 /run/flannel/docker
文件,docker 后續使用這個文件中的環境變量設置 docker0
網橋,從而從這個地址段為本節點的所有 Pod 容器分配 IP。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
下載和分發 flanneld 二進制文件
從 flannel 的 release 頁面 下載最新版本的安裝包:
cd /opt/k8s/work
mkdir flannel
wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz
tar -xzvf flannel-v0.11.0-linux-amd64.tar.gz -C flannel
分發二進制文件到集群所有節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp flannel/{flanneld,mk-docker-opts.sh} root@${node_ip}:/opt/k8s/bin/
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
創建 flannel 證書和私鑰
flanneld 從 etcd 集群存取網段分配信息,而 etcd 集群啟用了雙向 x509 證書認證,所以需要為 flanneld 生成證書和私鑰。
創建證書簽名請求:
cd /opt/k8s/work
cat > flanneld-csr.json <<EOF
{
"CN": "flanneld",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
- 該證書只會被 kubectl 當做 client 證書使用,所以 hosts 字段為空;
生成證書和私鑰:
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
ls flanneld*pem
將生成的證書和私鑰分發到所有節點(master 和 worker):
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/flanneld/cert"
scp flanneld*.pem root@${node_ip}:/etc/flanneld/cert
done
向 etcd 寫入集群 Pod 網段信息
注意:本步驟只需執行一次。
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/opt/k8s/work/ca.pem \
--cert-file=/opt/k8s/work/flanneld.pem \
--key-file=/opt/k8s/work/flanneld-key.pem \
mk ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}'
- flanneld 當前版本 (v0.11.0) 不支持 etcd v3,故使用 etcd v2 API 寫入配置 key 和網段數據;
- 寫入的 Pod 網段
${CLUSTER_CIDR}
地址段(如 /16)必須小於SubnetLen
,必須與kube-controller-manager
的--cluster-cidr
參數值一致;
創建 flanneld 的 systemd unit 文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > flanneld.service << EOF
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
Before=docker.service
[Service]
Type=notify
ExecStart=/opt/k8s/bin/flanneld \\
-etcd-cafile=/etc/kubernetes/cert/ca.pem \\
-etcd-certfile=/etc/flanneld/cert/flanneld.pem \\
-etcd-keyfile=/etc/flanneld/cert/flanneld-key.pem \\
-etcd-endpoints=${ETCD_ENDPOINTS} \\
-etcd-prefix=${FLANNEL_ETCD_PREFIX} \\
-iface=${IFACE} \\
-ip-masq
ExecStartPost=/opt/k8s/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
EOF
mk-docker-opts.sh
腳本將分配給 flanneld 的 Pod 子網段信息寫入/run/flannel/docker
文件,后續 docker 啟動時使用這個文件中的環境變量配置 docker0 網橋;- flanneld 使用系統缺省路由所在的接口與其它節點通信,對於有多個網絡接口(如內網和公網)的節點,可以用
-iface
參數指定通信接口; - flanneld 運行時需要 root 權限;
-ip-masq
: flanneld 為訪問 Pod 網絡外的流量設置 SNAT 規則,同時將傳遞給 Docker 的變量--ip-masq
(/run/flannel/docker
文件中)設置為 false,這樣 Docker 將不再創建 SNAT 規則; Docker 的--ip-masq
為 true 時,創建的 SNAT 規則比較“暴力”:將所有本節點 Pod 發起的、訪問非 docker0 接口的請求做 SNAT,這樣訪問其他節點 Pod 的請求來源 IP 會被設置為 flannel.1 接口的 IP,導致目的 Pod 看不到真實的來源 Pod IP。 flanneld 創建的 SNAT 規則比較溫和,只對訪問非 Pod 網段的請求做 SNAT。
分發 flanneld systemd unit 文件到所有節點
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp flanneld.service root@${node_ip}:/etc/systemd/system/
done
啟動 flanneld 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld"
done
檢查啟動結果
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status flanneld|grep Active"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u flanneld
檢查分配給各 flanneld 的 Pod 網段信息
查看集群 Pod 網段(/16):
source /opt/k8s/bin/environment.sh
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/config
輸出:
{"Network":"172.30.0.0/16", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}
查看已分配的 Pod 子網段列表(/24):
source /opt/k8s/bin/environment.sh
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
ls ${FLANNEL_ETCD_PREFIX}/subnets
輸出(結果視部署情況而定):
/kubernetes/network/subnets/172.30.24.0-21
/kubernetes/network/subnets/172.30.40.0-21
/kubernetes/network/subnets/172.30.200.0-21
查看某一 Pod 網段對應的節點 IP 和 flannel 接口地址:
source /opt/k8s/bin/environment.sh
etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--ca-file=/etc/kubernetes/cert/ca.pem \
--cert-file=/etc/flanneld/cert/flanneld.pem \
--key-file=/etc/flanneld/cert/flanneld-key.pem \
get ${FLANNEL_ETCD_PREFIX}/subnets/172.30.24.0-21
輸出(結果視部署情況而定):
{"PublicIP":"192.168.75.110","BackendType":"vxlan","BackendData":{"VtepMAC":"62:08:2f:f4:b8:a9"}}
- 172.30.24.0/21 被分配給節點 kube-node1(192.168.75.110);
- VtepMAC 為 kube-node1 節點的 flannel.1 網卡 MAC 地址;
檢查節點 flannel 網絡信息
[root@kube-node1 work]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:4f:53:fa brd ff:ff:ff:ff:ff:ff
inet 192.168.75.110/24 brd 192.168.75.255 scope global ens33
valid_lft forever preferred_lft forever
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 62:08:2f:f4:b8:a9 brd ff:ff:ff:ff:ff:ff
inet 172.30.24.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
- flannel.1 網卡的地址為分配的 Pod 子網段的第一個 IP(.0),且是 /32 的地址;
[root@kube-node1 work]# ip route show |grep flannel.1
172.30.40.0/21 via 172.30.40.0 dev flannel.1 onlink
172.30.200.0/21 via 172.30.200.0 dev flannel.1 onlink
- 到其它節點 Pod 網段請求都被轉發到 flannel.1 網卡;
- flanneld 根據 etcd 中子網段的信息,如
${FLANNEL_ETCD_PREFIX}/subnets/172.30.24.0-21
,來決定進請求發送給哪個節點的互聯 IP;
驗證各節點能通過 Pod 網段互通
在各節點上部署 flannel 后,檢查是否創建了 flannel 接口(名稱可能為 flannel0、flannel.0、flannel.1 等):
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet"
done
輸出:
>>> 192.168.75.110
inet 172.30.24.0/32 scope global flannel.1
>>> 192.168.75.111
inet 172.30.40.0/32 scope global flannel.1
>>> 192.168.75.112
inet 172.30.200.0/32 scope global flannel.1
在各節點上 ping 所有 flannel 接口 IP,確保能通:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "ping -c 1 172.30.80.0"
ssh ${node_ip} "ping -c 1 172.30.32.0"
ssh ${node_ip} "ping -c 1 172.30.184.0"
done
06-0 kube-apiserver 高可用之 nginx 代理
本文檔講解使用 nginx 4 層透明代理功能實現 K8S 節點( master 節點和 worker 節點)高可用訪問 kube-apiserver 的步驟。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
基於 nginx 代理的 kube-apiserver 高可用方案
- 控制節點的 kube-controller-manager、kube-scheduler 是多實例部署,所以只要有一個實例正常,就可以保證高可用;
- 集群內的 Pod 使用 K8S 服務域名 kubernetes 訪問 kube-apiserver, kube-dns 會自動解析出多個 kube-apiserver 節點的 IP,所以也是高可用的;
- 在每個節點起一個 nginx 進程,后端對接多個 apiserver 實例,nginx 對它們做健康檢查和負載均衡;
- kubelet、kube-proxy、controller-manager、scheduler 通過本地的 nginx(監聽 127.0.0.1)訪問 kube-apiserver,從而實現 kube-apiserver 的高可用;
下載和編譯 nginx
下載源碼:
cd /opt/k8s/work
wget http://nginx.org/download/nginx-1.15.3.tar.gz
tar -xzvf nginx-1.15.3.tar.gz
配置編譯參數:
cd /opt/k8s/work/nginx-1.15.3
mkdir nginx-prefix
./configure --with-stream --without-http --prefix=$(pwd)/nginx-prefix --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module
--with-stream
:開啟 4 層透明轉發(TCP Proxy)功能;--without-xxx
:關閉所有其他功能,這樣生成的動態鏈接二進制程序依賴最小;
輸出:
Configuration summary
+ PCRE library is not used
+ OpenSSL library is not used
+ zlib library is not used
nginx path prefix: "/root/tmp/nginx-1.15.3/nginx-prefix"
nginx binary file: "/root/tmp/nginx-1.15.3/nginx-prefix/sbin/nginx"
nginx modules path: "/root/tmp/nginx-1.15.3/nginx-prefix/modules"
nginx configuration prefix: "/root/tmp/nginx-1.15.3/nginx-prefix/conf"
nginx configuration file: "/root/tmp/nginx-1.15.3/nginx-prefix/conf/nginx.conf"
nginx pid file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/nginx.pid"
nginx error log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/error.log"
nginx http access log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/access.log"
nginx http client request body temporary files: "client_body_temp"
nginx http proxy temporary files: "proxy_temp"
編譯和安裝:
cd /opt/k8s/work/nginx-1.15.3
make && make install
驗證編譯的 nginx
cd /opt/k8s/work/nginx-1.15.3
./nginx-prefix/sbin/nginx -v
輸出:
nginx version: nginx/1.15.3
查看 nginx 動態鏈接的庫:
$ ldd ./nginx-prefix/sbin/nginx
輸出:
linux-vdso.so.1 => (0x00007ffc945e7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f4385072000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4384e56000)
libc.so.6 => /lib64/libc.so.6 (0x00007f4384a89000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4385276000)
- 由於只開啟了 4 層透明轉發功能,所以除了依賴 libc 等操作系統核心 lib 庫外,沒有對其它 lib 的依賴(如 libz、libssl 等),這樣可以方便部署到各版本操作系統中;
安裝和部署 nginx
創建目錄結構:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
done
拷貝二進制程序:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
scp /opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx root@${node_ip}:/opt/k8s/kube-nginx/sbin/kube-nginx
ssh root@${node_ip} "chmod a+x /opt/k8s/kube-nginx/sbin/*"
done
- 重命名二進制文件為 kube-nginx;
配置 nginx,開啟 4 層透明轉發功能:
cd /opt/k8s/work
cat > kube-nginx.conf << \EOF
worker_processes 1;
events {
worker_connections 1024;
}
stream {
upstream backend {
hash $remote_addr consistent;
server 192.168.75.110:6443 max_fails=3 fail_timeout=30s;
server 192.168.75.111:6443 max_fails=3 fail_timeout=30s;
server 192.168.75.112:6443 max_fails=3 fail_timeout=30s;
}
server {
listen 127.0.0.1:8443;
proxy_connect_timeout 1s;
proxy_pass backend;
}
}
EOF
- 需要根據集群 kube-apiserver 的實際情況,替換 backend 中 server 列表;
分發配置文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-nginx.conf root@${node_ip}:/opt/k8s/kube-nginx/conf/kube-nginx.conf
done
配置 systemd unit 文件,啟動服務
配置 kube-nginx systemd unit 文件:
cd /opt/k8s/work
cat > kube-nginx.service <<EOF
[Unit]
Description=kube-apiserver nginx proxy
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=forking
ExecStartPre=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -t
ExecStart=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx
ExecReload=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -s reload
PrivateTmp=true
Restart=always
RestartSec=5
StartLimitInterval=0
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
分發 systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-nginx.service root@${node_ip}:/etc/systemd/system/
done
啟動 kube-nginx 服務:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx"
done
檢查 kube-nginx 服務運行狀態
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-nginx |grep 'Active:'"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u kube-nginx
06-1.部署 master 節點
kubernetes master 節點運行如下組件:
- kube-apiserver
- kube-scheduler
- kube-controller-manager
- kube-nginx
kube-apiserver、kube-scheduler 和 kube-controller-manager 均以多實例模式運行:
- kube-scheduler 和 kube-controller-manager 會自動選舉產生一個 leader 實例,其它實例處於阻塞模式,當 leader 掛了后,重新選舉產生新的 leader,從而保證服務可用性;
- kube-apiserver 是無狀態的,需要通過 kube-nginx 進行代理訪問,從而保證服務可用性;
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
安裝和配置 kube-nginx
參考 06-0.apiserver高可用之nginx代理.md
下載最新版本二進制文件
從 CHANGELOG 頁面 下載二進制 tar 文件並解壓:
cd /opt/k8s/work
# 使用迅雷下載后上傳,不過有個問題,迅雷下載后的文件名是kubernetes-server-linux-amd64.tar.tar。注意后綴不是gz,使用的時候需要修改一下
wget https://dl.k8s.io/v1.14.2/kubernetes-server-linux-amd64.tar.gz
tar -xzvf kubernetes-server-linux-amd64.tar.gz
cd kubernetes
tar -xzvf kubernetes-src.tar.gz
將二進制文件拷貝到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kubernetes/server/bin/{apiextensions-apiserver,cloud-controller-manager,kube-apiserver,kube-controller-manager,kube-proxy,kube-scheduler,kubeadm,kubectl,kubelet,mounter} root@${node_ip}:/opt/k8s/bin/
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
06-2.部署高可用 kube-apiserver 集群
本文檔講解部署一個三實例 kube-apiserver 集群的步驟,它們通過 kube-nginx 進行代理訪問,從而保證服務可用性。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
准備工作
下載最新版本的二進制文件、安裝和配置 flanneld 參考:06-1.部署master節點.md
創建 kubernetes 證書和私鑰
創建證書簽名請求:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kubernetes-csr.json <<EOF
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"172.27.137.240",
"172.27.137.239",
"172.27.137.238",
"${CLUSTER_KUBERNETES_SVC_IP}",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local."
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
-
hosts 字段指定授權使用該證書的 IP 和域名列表,這里列出了 master 節點 IP、kubernetes 服務的 IP 和域名;
-
kubernetes 服務 IP 是 apiserver 自動創建的,一般是
--service-cluster-ip-range
參數指定的網段的第一個IP,后續可以通過下面命令獲取:$ kubectl get svc kubernetes NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes 10.254.0.1 <none> 443/TCP 1d
生成證書和私鑰:
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
ls kubernetes*pem
將生成的證書和私鑰文件拷貝到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert"
scp kubernetes*.pem root@${node_ip}:/etc/kubernetes/cert/
done
創建加密配置文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > encryption-config.yaml <<EOF
kind: EncryptionConfig
apiVersion: v1
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: ${ENCRYPTION_KEY}
- identity: {}
EOF
將加密配置文件拷貝到 master 節點的 /etc/kubernetes
目錄下:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp encryption-config.yaml root@${node_ip}:/etc/kubernetes/
done
創建審計策略文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > audit-policy.yaml <<EOF
apiVersion: audit.k8s.io/v1beta1
kind: Policy
rules:
# The following requests were manually identified as high-volume and low-risk, so drop them.
- level: None
resources:
- group: ""
resources:
- endpoints
- services
- services/status
users:
- 'system:kube-proxy'
verbs:
- watch
- level: None
resources:
- group: ""
resources:
- nodes
- nodes/status
userGroups:
- 'system:nodes'
verbs:
- get
- level: None
namespaces:
- kube-system
resources:
- group: ""
resources:
- endpoints
users:
- 'system:kube-controller-manager'
- 'system:kube-scheduler'
- 'system:serviceaccount:kube-system:endpoint-controller'
verbs:
- get
- update
- level: None
resources:
- group: ""
resources:
- namespaces
- namespaces/status
- namespaces/finalize
users:
- 'system:apiserver'
verbs:
- get
# Don't log HPA fetching metrics.
- level: None
resources:
- group: metrics.k8s.io
users:
- 'system:kube-controller-manager'
verbs:
- get
- list
# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- '/healthz*'
- /version
- '/swagger*'
# Don't log events requests.
- level: None
resources:
- group: ""
resources:
- events
# node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes
- level: Request
omitStages:
- RequestReceived
resources:
- group: ""
resources:
- nodes/status
- pods/status
users:
- kubelet
- 'system:node-problem-detector'
- 'system:serviceaccount:kube-system:node-problem-detector'
verbs:
- update
- patch
- level: Request
omitStages:
- RequestReceived
resources:
- group: ""
resources:
- nodes/status
- pods/status
userGroups:
- 'system:nodes'
verbs:
- update
- patch
# deletecollection calls can be large, don't log responses for expected namespace deletions
- level: Request
omitStages:
- RequestReceived
users:
- 'system:serviceaccount:kube-system:namespace-controller'
verbs:
- deletecollection
# Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
omitStages:
- RequestReceived
resources:
- group: ""
resources:
- secrets
- configmaps
- group: authentication.k8s.io
resources:
- tokenreviews
# Get repsonses can be large; skip them.
- level: Request
omitStages:
- RequestReceived
resources:
- group: ""
- group: admissionregistration.k8s.io
- group: apiextensions.k8s.io
- group: apiregistration.k8s.io
- group: apps
- group: authentication.k8s.io
- group: authorization.k8s.io
- group: autoscaling
- group: batch
- group: certificates.k8s.io
- group: extensions
- group: metrics.k8s.io
- group: networking.k8s.io
- group: policy
- group: rbac.authorization.k8s.io
- group: scheduling.k8s.io
- group: settings.k8s.io
- group: storage.k8s.io
verbs:
- get
- list
- watch
# Default level for known APIs
- level: RequestResponse
omitStages:
- RequestReceived
resources:
- group: ""
- group: admissionregistration.k8s.io
- group: apiextensions.k8s.io
- group: apiregistration.k8s.io
- group: apps
- group: authentication.k8s.io
- group: authorization.k8s.io
- group: autoscaling
- group: batch
- group: certificates.k8s.io
- group: extensions
- group: metrics.k8s.io
- group: networking.k8s.io
- group: policy
- group: rbac.authorization.k8s.io
- group: scheduling.k8s.io
- group: settings.k8s.io
- group: storage.k8s.io
# Default level for all other requests.
- level: Metadata
omitStages:
- RequestReceived
EOF
分發審計策略文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp audit-policy.yaml root@${node_ip}:/etc/kubernetes/audit-policy.yaml
done
創建后續訪問 metrics-server 使用的證書
創建證書簽名請求:
cat > proxy-client-csr.json <<EOF
{
"CN": "aggregator",
"hosts": [],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
- CN 名稱需要位於 kube-apiserver 的
--requestheader-allowed-names
參數中,否則后續訪問 metrics 時會提示權限不足。
生成證書和私鑰:
cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
-ca-key=/etc/kubernetes/cert/ca-key.pem \
-config=/etc/kubernetes/cert/ca-config.json \
-profile=kubernetes proxy-client-csr.json | cfssljson -bare proxy-client
ls proxy-client*.pem
將生成的證書和私鑰文件拷貝到所有 master 節點:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp proxy-client*.pem root@${node_ip}:/etc/kubernetes/cert/
done
創建 kube-apiserver systemd unit 模板文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-apiserver.service.template <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=${K8S_DIR}/kube-apiserver
ExecStart=/opt/k8s/bin/kube-apiserver \\
--advertise-address=##NODE_IP## \\
--default-not-ready-toleration-seconds=360 \\
--default-unreachable-toleration-seconds=360 \\
--feature-gates=DynamicAuditing=true \\
--max-mutating-requests-inflight=2000 \\
--max-requests-inflight=4000 \\
--default-watch-cache-size=200 \\
--delete-collection-workers=2 \\
--encryption-provider-config=/etc/kubernetes/encryption-config.yaml \\
--etcd-cafile=/etc/kubernetes/cert/ca.pem \\
--etcd-certfile=/etc/kubernetes/cert/kubernetes.pem \\
--etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem \\
--etcd-servers=${ETCD_ENDPOINTS} \\
--bind-address=##NODE_IP## \\
--secure-port=6443 \\
--tls-cert-file=/etc/kubernetes/cert/kubernetes.pem \\
--tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem \\
--insecure-port=0 \\
--audit-dynamic-configuration \\
--audit-log-maxage=15 \\
--audit-log-maxbackup=3 \\
--audit-log-maxsize=100 \\
--audit-log-truncate-enabled \\
--audit-log-path=${K8S_DIR}/kube-apiserver/audit.log \\
--audit-policy-file=/etc/kubernetes/audit-policy.yaml \\
--profiling \\
--anonymous-auth=false \\
--client-ca-file=/etc/kubernetes/cert/ca.pem \\
--enable-bootstrap-token-auth \\
--requestheader-allowed-names="aggregator" \\
--requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--service-account-key-file=/etc/kubernetes/cert/ca.pem \\
--authorization-mode=Node,RBAC \\
--runtime-config=api/all=true \\
--enable-admission-plugins=NodeRestriction \\
--allow-privileged=true \\
--apiserver-count=3 \\
--event-ttl=168h \\
--kubelet-certificate-authority=/etc/kubernetes/cert/ca.pem \\
--kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem \\
--kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem \\
--kubelet-https=true \\
--kubelet-timeout=10s \\
--proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem \\
--proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \\
--service-cluster-ip-range=${SERVICE_CIDR} \\
--service-node-port-range=${NODE_PORT_RANGE} \\
--logtostderr=true \\
--v=2
Restart=on-failure
RestartSec=10
Type=notify
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
--advertise-address
:apiserver 對外通告的 IP(kubernetes 服務后端節點 IP);--default-*-toleration-seconds
:設置節點異常相關的閾值;--max-*-requests-inflight
:請求相關的最大閾值;--etcd-*
:訪問 etcd 的證書和 etcd 服務器地址;--experimental-encryption-provider-config
:指定用於加密 etcd 中 secret 的配置;--bind-address
: https 監聽的 IP,不能為127.0.0.1
,否則外界不能訪問它的安全端口 6443;--secret-port
:https 監聽端口;--insecure-port=0
:關閉監聽 http 非安全端口(8080);--tls-*-file
:指定 apiserver 使用的證書、私鑰和 CA 文件;--audit-*
:配置審計策略和審計日志文件相關的參數;--client-ca-file
:驗證 client (kue-controller-manager、kube-scheduler、kubelet、kube-proxy 等)請求所帶的證書;--enable-bootstrap-token-auth
:啟用 kubelet bootstrap 的 token 認證;--requestheader-*
:kube-apiserver 的 aggregator layer 相關的配置參數,proxy-client & HPA 需要使用;--requestheader-client-ca-file
:用於簽名--proxy-client-cert-file
和--proxy-client-key-file
指定的證書;在啟用了 metric aggregator 時使用;--requestheader-allowed-names
:不能為空,值為逗號分割的--proxy-client-cert-file
證書的 CN 名稱,這里設置為 "aggregator";--service-account-key-file
:簽名 ServiceAccount Token 的公鑰文件,kube-controller-manager 的--service-account-private-key-file
指定私鑰文件,兩者配對使用;--runtime-config=api/all=true
: 啟用所有版本的 APIs,如 autoscaling/v2alpha1;--authorization-mode=Node,RBAC
、--anonymous-auth=false
: 開啟 Node 和 RBAC 授權模式,拒絕未授權的請求;--enable-admission-plugins
:啟用一些默認關閉的 plugins;--allow-privileged
:運行執行 privileged 權限的容器;--apiserver-count=3
:指定 apiserver 實例的數量;--event-ttl
:指定 events 的保存時間;--kubelet-*
:如果指定,則使用 https 訪問 kubelet APIs;需要為證書對應的用戶(上面 kubernetes*.pem 證書的用戶為 kubernetes) 用戶定義 RBAC 規則,否則訪問 kubelet API 時提示未授權;--proxy-client-*
:apiserver 訪問 metrics-server 使用的證書;--service-cluster-ip-range
: 指定 Service Cluster IP 地址段;--service-node-port-range
: 指定 NodePort 的端口范圍;
如果 kube-apiserver 機器沒有運行 kube-proxy,則還需要添加 --enable-aggregator-routing=true
參數;
關於 --requestheader-XXX
相關參數,參考:
- https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md
- https://docs.bitnami.com/kubernetes/how-to/configure-autoscaling-custom-metrics/
注意:
- requestheader-client-ca-file 指定的 CA 證書,必須具有 client auth and server auth;
- 如果
--requestheader-allowed-names
不為空,且--proxy-client-cert-file
證書的 CN 名稱不在 allowed-names 中,則后續查看 node 或 pods 的 metrics 失敗,提示:
[root@zhangjun-k8s01 1.8+]# kubectl top nodes
Error from server (Forbidden): nodes.metrics.k8s.io is forbidden: User "aggregator" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scope
為各節點創建和分發 kube-apiserver systemd unit 文件
替換模板文件中的變量,為各節點生成 systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-apiserver.service.template > kube-apiserver-${NODE_IPS[i]}.service
done
ls kube-apiserver*.service
- NODE_NAMES 和 NODE_IPS 為相同長度的 bash 數組,分別為節點名稱和對應的 IP;
分發生成的 systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-apiserver-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-apiserver.service
done
- 文件重命名為 kube-apiserver.service;
啟動 kube-apiserver 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-apiserver"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver"
done
- 啟動服務前必須先創建工作目錄;
檢查 kube-apiserver 運行狀態
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-apiserver |grep 'Active:'"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u kube-apiserver
打印 kube-apiserver 寫入 etcd 的數據
source /opt/k8s/bin/environment.sh
ETCDCTL_API=3 etcdctl \
--endpoints=${ETCD_ENDPOINTS} \
--cacert=/opt/k8s/work/ca.pem \
--cert=/opt/k8s/work/etcd.pem \
--key=/opt/k8s/work/etcd-key.pem \
get /registry/ --prefix --keys-only
檢查集群信息
$ kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:8443
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
$ kubectl get all --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.254.0.1 <none> 443/TCP 12m
$ kubectl get componentstatuses
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
-
如果執行 kubectl 命令式時輸出如下錯誤信息,則說明使用的
~/.kube/config
文件不對,先檢查該文件是否存在,然后再檢查該文件中參數是否缺少值,然后再執行該命令:The connection to the server localhost:8080 was refused - did you specify the right host or port?
-
執行
kubectl get componentstatuses
命令時,apiserver 默認向 127.0.0.1 發送請求。當 controller-manager、scheduler 以集群模式運行時,有可能和 kube-apiserver 不在一台機器上,這時 controller-manager 或 scheduler 的狀態為 Unhealthy,但實際上它們工作正常。
檢查 kube-apiserver 監聽的端口
$ sudo netstat -lnpt|grep kube
tcp 0 0 172.27.137.240:6443 0.0.0.0:* LISTEN 101442/kube-apiserv
- 6443: 接收 https 請求的安全端口,對所有請求做認證和授權;
- 由於關閉了非安全端口,故沒有監聽 8080;
授予 kube-apiserver 訪問 kubelet API 的權限
在執行 kubectl exec、run、logs 等命令時,apiserver 會將請求轉發到 kubelet 的 https 端口。這里定義 RBAC 規則,授權 apiserver 使用的證書(kubernetes.pem)用戶名(CN:kuberntes)訪問 kubelet API 的權限:
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes
06-3.部署高可用 kube-controller-manager 集群
本文檔介紹部署高可用 kube-controller-manager 集群的步驟。
該集群包含 3 個節點,啟動后將通過競爭選舉機制產生一個 leader 節點,其它節點為阻塞狀態。當 leader 節點不可用時,阻塞的節點將再次進行選舉產生新的 leader 節點,從而保證服務的可用性。
為保證通信安全,本文檔先生成 x509 證書和私鑰,kube-controller-manager 在如下兩種情況下使用該證書:
- 與 kube-apiserver 的安全端口通信;
- 在安全端口(https,10252) 輸出 prometheus 格式的 metrics;
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
准備工作
下載最新版本的二進制文件、安裝和配置 flanneld 參考:06-1.部署master節點.md。
創建 kube-controller-manager 證書和私鑰
創建證書簽名請求:
cd /opt/k8s/work
cat > kube-controller-manager-csr.json <<EOF
{
"CN": "system:kube-controller-manager",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": [
"127.0.0.1",
"172.27.137.240",
"172.27.137.239",
"172.27.137.238"
],
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:kube-controller-manager",
"OU": "4Paradigm"
}
]
}
EOF
- hosts 列表包含所有 kube-controller-manager 節點 IP;
- CN 和 O 均為
system:kube-controller-manager
,kubernetes 內置的 ClusterRoleBindingssystem:kube-controller-manager
賦予 kube-controller-manager 工作所需的權限。
生成證書和私鑰:
cd /opt/k8s/work
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager
ls kube-controller-manager*pem
將生成的證書和私鑰分發到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager*.pem root@${node_ip}:/etc/kubernetes/cert/
done
創建和分發 kubeconfig 文件
kube-controller-manager 使用 kubeconfig 文件訪問 apiserver,該文件提供了 apiserver 地址、嵌入的 CA 證書和 kube-controller-manager 證書:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-credentials system:kube-controller-manager \
--client-certificate=kube-controller-manager.pem \
--client-key=kube-controller-manager-key.pem \
--embed-certs=true \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config set-context system:kube-controller-manager \
--cluster=kubernetes \
--user=system:kube-controller-manager \
--kubeconfig=kube-controller-manager.kubeconfig
kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig
分發 kubeconfig 到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager.kubeconfig root@${node_ip}:/etc/kubernetes/
done
創建 kube-controller-manager systemd unit 模板文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-controller-manager.service.template <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
WorkingDirectory=${K8S_DIR}/kube-controller-manager
ExecStart=/opt/k8s/bin/kube-controller-manager \\
--profiling \\
--cluster-name=kubernetes \\
--controllers=*,bootstrapsigner,tokencleaner \\
--kube-api-qps=1000 \\
--kube-api-burst=2000 \\
--leader-elect \\
--use-service-account-credentials\\
--concurrent-service-syncs=2 \\
--bind-address=##NODE_IP## \\
--secure-port=10252 \\
--tls-cert-file=/etc/kubernetes/cert/kube-controller-manager.pem \\
--tls-private-key-file=/etc/kubernetes/cert/kube-controller-manager-key.pem \\
--port=0 \\
--authentication-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--client-ca-file=/etc/kubernetes/cert/ca.pem \\
--requestheader-allowed-names="" \\
--requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--authorization-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--cluster-signing-cert-file=/etc/kubernetes/cert/ca.pem \\
--cluster-signing-key-file=/etc/kubernetes/cert/ca-key.pem \\
--experimental-cluster-signing-duration=876000h \\
--horizontal-pod-autoscaler-sync-period=10s \\
--concurrent-deployment-syncs=10 \\
--concurrent-gc-syncs=30 \\
--node-cidr-mask-size=24 \\
--service-cluster-ip-range=${SERVICE_CIDR} \\
--pod-eviction-timeout=6m \\
--terminated-pod-gc-threshold=10000 \\
--root-ca-file=/etc/kubernetes/cert/ca.pem \\
--service-account-private-key-file=/etc/kubernetes/cert/ca-key.pem \\
--kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
--logtostderr=true \\
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
--port=0
:關閉監聽非安全端口(http),同時--address
參數無效,--bind-address
參數有效;--secure-port=10252
、--bind-address=0.0.0.0
: 在所有網絡接口監聽 10252 端口的 https /metrics 請求;--kubeconfig
:指定 kubeconfig 文件路徑,kube-controller-manager 使用它連接和驗證 kube-apiserver;--authentication-kubeconfig
和--authorization-kubeconfig
:kube-controller-manager 使用它連接 apiserver,對 client 的請求進行認證和授權。kube-controller-manager
不再使用--tls-ca-file
對請求 https metrics 的 Client 證書進行校驗。如果沒有配置這兩個 kubeconfig 參數,則 client 連接 kube-controller-manager https 端口的請求會被拒絕(提示權限不足)。--cluster-signing-*-file
:簽名 TLS Bootstrap 創建的證書;--experimental-cluster-signing-duration
:指定 TLS Bootstrap 證書的有效期;--root-ca-file
:放置到容器 ServiceAccount 中的 CA 證書,用來對 kube-apiserver 的證書進行校驗;--service-account-private-key-file
:簽名 ServiceAccount 中 Token 的私鑰文件,必須和 kube-apiserver 的--service-account-key-file
指定的公鑰文件配對使用;--service-cluster-ip-range
:指定 Service Cluster IP 網段,必須和 kube-apiserver 中的同名參數一致;--leader-elect=true
:集群運行模式,啟用選舉功能;被選為 leader 的節點負責處理工作,其它節點為阻塞狀態;--controllers=*,bootstrapsigner,tokencleaner
:啟用的控制器列表,tokencleaner 用於自動清理過期的 Bootstrap token;--horizontal-pod-autoscaler-*
:custom metrics 相關參數,支持 autoscaling/v2alpha1;--tls-cert-file
、--tls-private-key-file
:使用 https 輸出 metrics 時使用的 Server 證書和秘鑰;--use-service-account-credentials=true
: kube-controller-manager 中各 controller 使用 serviceaccount 訪問 kube-apiserver;
為各節點創建和分發 kube-controller-mananger systemd unit 文件
替換模板文件中的變量,為各節點創建 systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-controller-manager.service.template > kube-controller-manager-${NODE_IPS[i]}.service
done
ls kube-controller-manager*.service
- NODE_NAMES 和 NODE_IPS 為相同長度的 bash 數組,分別為節點名稱和對應的 IP;
分發到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-controller-manager-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-controller-manager.service
done
- 文件重命名為 kube-controller-manager.service;
啟動 kube-controller-manager 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-controller-manager"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager"
done
- 啟動服務前必須先創建工作目錄;
檢查服務運行狀態
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-controller-manager|grep Active"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u kube-controller-manager
kube-controller-manager 監聽 10252 端口,接收 https 請求:
[root@kube-node1 work]# netstat -lnpt | grep kube-cont
tcp 0 0 192.168.75.110:10252 0.0.0.0:* LISTEN 11439/kube-controll
查看輸出的 metrics
注意:以下命令在 kube-controller-manager 節點上執行。
[root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://172.27.137.240:10252/metrics |head
^X^Z
[1]+ Stopped curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://172.27.137.240:10252/metrics | head
[root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10252/metrics |head
# HELP ClusterRoleAggregator_adds (Deprecated) Total number of adds handled by workqueue: ClusterRoleAggregator
# TYPE ClusterRoleAggregator_adds counter
ClusterRoleAggregator_adds 13
# HELP ClusterRoleAggregator_depth (Deprecated) Current depth of workqueue: ClusterRoleAggregator
# TYPE ClusterRoleAggregator_depth gauge
ClusterRoleAggregator_depth 0
# HELP ClusterRoleAggregator_longest_running_processor_microseconds (Deprecated) How many microseconds has the longest running processor for ClusterRoleAggregator been running.
# TYPE ClusterRoleAggregator_longest_running_processor_microseconds gauge
ClusterRoleAggregator_longest_running_processor_microseconds 0
# HELP ClusterRoleAggregator_queue_latency (Deprecated) How long an item stays in workqueueClusterRoleAggregator before being requested.
kube-controller-manager 的權限
ClusteRole system:kube-controller-manager
的權限很小,只能創建 secret、serviceaccount 等資源對象,各 controller 的權限分散到 ClusterRole system:controller:XXX
中:
[root@kube-node1 work]# kubectl describe clusterrole system:kube-controller-manager
Name: system:kube-controller-manager
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
secrets [] [] [create delete get update]
endpoints [] [] [create get update]
serviceaccounts [] [] [create get update]
events [] [] [create patch update]
tokenreviews.authentication.k8s.io [] [] [create]
subjectaccessreviews.authorization.k8s.io [] [] [create]
configmaps [] [] [get]
namespaces [] [] [get]
*.* [] [] [list watch]
需要在 kube-controller-manager 的啟動參數中添加 --use-service-account-credentials=true
參數,這樣 main controller 會為各 controller 創建對應的 ServiceAccount XXX-controller。內置的 ClusterRoleBinding system:controller:XXX 將賦予各 XXX-controller ServiceAccount 對應的 ClusterRole system:controller:XXX 權限。
$ kubectl get clusterrole|grep controller
system:controller:attachdetach-controller 51m
system:controller:certificate-controller 51m
system:controller:clusterrole-aggregation-controller 51m
system:controller:cronjob-controller 51m
system:controller:daemon-set-controller 51m
system:controller:deployment-controller 51m
system:controller:disruption-controller 51m
system:controller:endpoint-controller 51m
system:controller:expand-controller 51m
system:controller:generic-garbage-collector 51m
system:controller:horizontal-pod-autoscaler 51m
system:controller:job-controller 51m
system:controller:namespace-controller 51m
system:controller:node-controller 51m
system:controller:persistent-volume-binder 51m
system:controller:pod-garbage-collector 51m
system:controller:pv-protection-controller 51m
system:controller:pvc-protection-controller 51m
system:controller:replicaset-controller 51m
system:controller:replication-controller 51m
system:controller:resourcequota-controller 51m
system:controller:route-controller 51m
system:controller:service-account-controller 51m
system:controller:service-controller 51m
system:controller:statefulset-controller 51m
system:controller:ttl-controller 51m
system:kube-controller-manager 51m
以 deployment controller 為例:
$ kubectl describe clusterrole system:controller:deployment-controller
Name: system:controller:deployment-controller
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
replicasets.apps [] [] [create delete get list patch update watch]
replicasets.extensions [] [] [create delete get list patch update watch]
events [] [] [create patch update]
pods [] [] [get list update watch]
deployments.apps [] [] [get list update watch]
deployments.extensions [] [] [get list update watch]
deployments.apps/finalizers [] [] [update]
deployments.apps/status [] [] [update]
deployments.extensions/finalizers [] [] [update]
deployments.extensions/status [] [] [update]
查看當前的 leader
[root@kube-node1 work]# kubectl get endpoints kube-controller-manager --namespace=kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-node3_ef7efd0f-0149-11ea-8f8a-000c291d1820","leaseDurationSeconds":15,"acquireTime":"2019-11-07T10:39:33Z","renewTime":"2019-11-07T10:43:10Z","leaderTransitions":2}'
creationTimestamp: "2019-11-07T10:32:42Z"
name: kube-controller-manager
namespace: kube-system
resourceVersion: "3766"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
uid: ee2f71e3-0149-11ea-98c9-000c291d1820
可見,當前的 leader 為 kube-node1 節點。
測試 kube-controller-manager 集群的高可用
停掉一個或兩個節點的 kube-controller-manager 服務,觀察其它節點的日志,看是否獲取了 leader 權限。
參考
- 關於 controller 權限和 use-service-account-credentials 參數:https://github.com/kubernetes/kubernetes/issues/48208
- kubelet 認證和授權:https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authorization
06-4.部署高可用 kube-scheduler 集群
本文檔介紹部署高可用 kube-scheduler 集群的步驟。
該集群包含 3 個節點,啟動后將通過競爭選舉機制產生一個 leader 節點,其它節點為阻塞狀態。當 leader 節點不可用后,剩余節點將再次進行選舉產生新的 leader 節點,從而保證服務的可用性。
為保證通信安全,本文檔先生成 x509 證書和私鑰,kube-scheduler 在如下兩種情況下使用該證書:
- 與 kube-apiserver 的安全端口通信;
- 在安全端口(https,10251) 輸出 prometheus 格式的 metrics;
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
准備工作
下載最新版本的二進制文件、安裝和配置 flanneld 參考:06-1.部署master節點.md。
創建 kube-scheduler 證書和私鑰
創建證書簽名請求:
cd /opt/k8s/work
cat > kube-scheduler-csr.json <<EOF
{
"CN": "system:kube-scheduler",
"hosts": [
"127.0.0.1",
"192.168.75.110",
"192.168.75.111",
"192.168.75.112"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "system:kube-scheduler",
"OU": "4Paradigm"
}
]
}
EOF
- hosts 列表包含所有 kube-scheduler 節點 IP;
- CN 和 O 均為
system:kube-scheduler
,kubernetes 內置的 ClusterRoleBindingssystem:kube-scheduler
將賦予 kube-scheduler 工作所需的權限;
生成證書和私鑰:
cd /opt/k8s/work
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler
ls kube-scheduler*pem
將生成的證書和私鑰分發到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler*.pem root@${node_ip}:/etc/kubernetes/cert/
done
創建和分發 kubeconfig 文件
kube-scheduler 使用 kubeconfig 文件訪問 apiserver,該文件提供了 apiserver 地址、嵌入的 CA 證書和 kube-scheduler 證書:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config set-credentials system:kube-scheduler \
--client-certificate=kube-scheduler.pem \
--client-key=kube-scheduler-key.pem \
--embed-certs=true \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config set-context system:kube-scheduler \
--cluster=kubernetes \
--user=system:kube-scheduler \
--kubeconfig=kube-scheduler.kubeconfig
kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig
分發 kubeconfig 到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler.kubeconfig root@${node_ip}:/etc/kubernetes/
done
創建 kube-scheduler 配置文件
cd /opt/k8s/work
cat >kube-scheduler.yaml.template <<EOF
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
bindTimeoutSeconds: 600
clientConnection:
burst: 200
kubeconfig: "/etc/kubernetes/kube-scheduler.kubeconfig"
qps: 100
enableContentionProfiling: false
enableProfiling: true
hardPodAffinitySymmetricWeight: 1
healthzBindAddress: ##NODE_IP##:10251
leaderElection:
leaderElect: true
metricsBindAddress: ##NODE_IP##:10251
EOF
--kubeconfig
:指定 kubeconfig 文件路徑,kube-scheduler 使用它連接和驗證 kube-apiserver;--leader-elect=true
:集群運行模式,啟用選舉功能;被選為 leader 的節點負責處理工作,其它節點為阻塞狀態;
替換模板文件中的變量:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.yaml.template > kube-scheduler-${NODE_IPS[i]}.yaml
done
ls kube-scheduler*.yaml
- NODE_NAMES 和 NODE_IPS 為相同長度的 bash 數組,分別為節點名稱和對應的 IP;
分發 kube-scheduler 配置文件到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler-${node_ip}.yaml root@${node_ip}:/etc/kubernetes/kube-scheduler.yaml
done
- 重命名為 kube-scheduler.yaml;
創建 kube-scheduler systemd unit 模板文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-scheduler.service.template <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
[Service]
WorkingDirectory=${K8S_DIR}/kube-scheduler
ExecStart=/opt/k8s/bin/kube-scheduler \\
--config=/etc/kubernetes/kube-scheduler.yaml \\
--bind-address=##NODE_IP## \\
--secure-port=10259 \\
--port=0 \\
--tls-cert-file=/etc/kubernetes/cert/kube-scheduler.pem \\
--tls-private-key-file=/etc/kubernetes/cert/kube-scheduler-key.pem \\
--authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--client-ca-file=/etc/kubernetes/cert/ca.pem \\
--requestheader-allowed-names="" \\
--requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \\
--requestheader-extra-headers-prefix="X-Remote-Extra-" \\
--requestheader-group-headers=X-Remote-Group \\
--requestheader-username-headers=X-Remote-User \\
--authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
--logtostderr=true \\
--v=2
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
EOF
為各節點創建和分發 kube-scheduler systemd unit 文件
替換模板文件中的變量,為各節點創建 systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
do
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.service.template > kube-scheduler-${NODE_IPS[i]}.service
done
ls kube-scheduler*.service
- NODE_NAMES 和 NODE_IPS 為相同長度的 bash 數組,分別為節點名稱和對應的 IP;
分發 systemd unit 文件到所有 master 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp kube-scheduler-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-scheduler.service
done
- 重命名為 kube-scheduler.service;
啟動 kube-scheduler 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-scheduler"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler"
done
- 啟動服務前必須先創建工作目錄;
檢查服務運行狀態
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-scheduler|grep Active"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u kube-scheduler
查看輸出的 metrics
注意:以下命令在 kube-scheduler 節點上執行。
kube-scheduler 監聽 10251 和 10259 端口:
- 10251:接收 http 請求,非安全端口,不需要認證授權;
- 10259:接收 https 請求,安全端口,需要認證授權;
兩個接口都對外提供 /metrics
和 /healthz
的訪問。
[root@kube-node1 work]# netstat -lnpt |grep kube-sch
tcp 0 0 192.168.75.110:10259 0.0.0.0:* LISTEN 17034/kube-schedule
tcp 0 0 192.168.75.110:10251 0.0.0.0:* LISTEN 17034/kube-schedule
[root@kube-node1 work]# curl -s http://192.168.75.110:10251/metrics |head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
[root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10259/metrics |head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
查看當前的 leader
[root@kube-node1 work]# kubectl get endpoints kube-scheduler --namespace=kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-node1_a0c24012-0152-11ea-9e7b-000c294f53fa","leaseDurationSeconds":15,"acquireTime":"2019-11-07T11:34:59Z","renewTime":"2019-11-07T11:39:36Z","leaderTransitions":0}'
creationTimestamp: "2019-11-07T11:34:57Z"
name: kube-scheduler
namespace: kube-system
resourceVersion: "6598"
selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
uid: a00f12ce-0152-11ea-98c9-000c291d1820
可見,當前的 leader 為 kube-node1 節點。
測試 kube-scheduler 集群的高可用
隨便找一個或兩個 master 節點,停掉 kube-scheduler 服務,看其它節點是否獲取了 leader 權限。
07-0.部署 worker 節點
kubernetes worker 節點運行如下組件:
- docker
- kubelet
- kube-proxy
- flanneld
- kube-nginx
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
安裝和配置 flanneld
安裝和配置 kube-nginx
參考 06-0.apiserver高可用之nginx代理.md。
安裝依賴包
CentOS:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "yum install -y epel-release"
ssh root@${node_ip} "yum install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs "
done
Ubuntu:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "apt-get install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs "
done
07-1.部署 docker 組件
docker 運行和管理容器,kubelet 通過 Container Runtime Interface (CRI) 與它進行交互。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
安裝依賴包
下載和分發 docker 二進制文件
到 docker 下載頁面 下載最新發布包:
cd /opt/k8s/work
wget https://download.docker.com/linux/static/stable/x86_64/docker-18.09.6.tgz
tar -xvf docker-18.09.6.tgz
分發二進制文件到所有 worker 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp docker/* root@${node_ip}:/opt/k8s/bin/
ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
done
創建和分發 systemd unit 文件
cd /opt/k8s/work
cat > docker.service <<"EOF"
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io
[Service]
WorkingDirectory=##DOCKER_DIR##
Environment="PATH=/opt/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin"
EnvironmentFile=-/run/flannel/docker
ExecStart=/opt/k8s/bin/dockerd $DOCKER_NETWORK_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process
[Install]
WantedBy=multi-user.target
EOF
-
EOF 前后有雙引號,這樣 bash 不會替換文檔中的變量,如
$DOCKER_NETWORK_OPTIONS
(這些環境變量是 systemd 負責替換的。); -
dockerd 運行時會調用其它 docker 命令,如 docker-proxy,所以需要將 docker 命令所在的目錄加到 PATH 環境變量中;
-
flanneld 啟動時將網絡配置寫入
/run/flannel/docker
文件中,dockerd 啟動前讀取該文件中的環境變量DOCKER_NETWORK_OPTIONS
,然后設置 docker0 網橋網段; -
如果指定了多個
EnvironmentFile
選項,則必須將/run/flannel/docker
放在最后(確保 docker0 使用 flanneld 生成的 bip 參數); -
docker 需要以 root 用於運行;
-
docker 從 1.13 版本開始,可能將 iptables FORWARD chain的默認策略設置為DROP,從而導致 ping 其它 Node 上的 Pod IP 失敗,遇到這種情況時,需要手動設置策略為
ACCEPT
:$ sudo iptables -P FORWARD ACCEPT
並且把以下命令寫入
/etc/rc.local
文件中,防止節點重啟iptables FORWARD chain的默認策略又還原為DROP/sbin/iptables -P FORWARD ACCEPT
分發 systemd unit 文件到所有 worker 機器:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
sed -i -e "s|##DOCKER_DIR##|${DOCKER_DIR}|" docker.service
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
scp docker.service root@${node_ip}:/etc/systemd/system/
done
配置和分發 docker 配置文件
使用國內的倉庫鏡像服務器以加快 pull image 的速度,同時增加下載的並發數 (需要重啟 dockerd 生效):
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > docker-daemon.json <<EOF
{
"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"],
"insecure-registries": ["docker02:35000"],
"max-concurrent-downloads": 20,
"live-restore": true,
"max-concurrent-uploads": 10,
"debug": true,
"data-root": "${DOCKER_DIR}/data",
"exec-root": "${DOCKER_DIR}/exec",
"log-opts": {
"max-size": "100m",
"max-file": "5"
}
}
EOF
分發 docker 配置文件到所有 worker 節點:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p /etc/docker/ ${DOCKER_DIR}/{data,exec}"
scp docker-daemon.json root@${node_ip}:/etc/docker/daemon.json
done
啟動 docker 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker"
done
檢查服務運行狀態
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status docker|grep Active"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u docker
檢查 docker0 網橋
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0"
done
確認各 worker 節點的 docker0 網橋和 flannel.1 接口的 IP 處於同一個網段中(如下172.30.24.0/32 位於 172.30.24.1/21 中):
>>> 192.168.75.110
3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether ea:90:d9:9a:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 172.30.24.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:a8:55:ff:36 brd ff:ff:ff:ff:ff:ff
inet 172.30.24.1/21 brd 172.30.31.255 scope global docker0
valid_lft forever preferred_lft forever
注意: 如果您的服務安裝順序不對或者機器環境比較復雜, docker服務早於flanneld服務安裝,此時 worker 節點的 docker0 網橋和 flannel.1 接口的 IP可能不會同處同一個網段下,這個時候請先停止docker服務, 手工刪除docker0網卡,重新啟動docker服務后即可修復:
systemctl stop docker
ip link delete docker0
systemctl start docker
查看 docker 的狀態信息
[root@kube-node1 work]# ps -elfH|grep docker
4 S root 22497 1 0 80 0 - 108496 ep_pol 20:44 ? 00:00:00 /opt/k8s/bin/dockerd --bip=172.30.24.1/21 --ip-masq=false --mtu=1450
4 S root 22515 22497 0 80 0 - 136798 futex_ 20:44 ? 00:00:00 containerd --config /data/k8s/docker/exec/containerd/containerd.toml --log-level debug
[root@kube-node1 work]# docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 18.09.6
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.4.199-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.936GiB
Name: kube-node1
ID: MQYP:O7RJ:F22K:TYEC:C5UW:XOLP:XRMF:VF6J:6JVH:AMGN:YLAI:U2FJ
Docker Root Dir: /data/k8s/docker/data
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 22
Goroutines: 43
System Time: 2019-11-07T20:48:23.252463652+08:00
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
docker02:35000
127.0.0.0/8
Registry Mirrors:
https://docker.mirrors.ustc.edu.cn/
https://hub-mirror.c.163.com/
Live Restore Enabled: true
Product License: Community Engine
07-2.部署 kubelet 組件
kubelet 運行在每個 worker 節點上,接收 kube-apiserver 發送的請求,管理 Pod 容器,執行交互式命令,如 exec、run、logs 等。
kubelet 啟動時自動向 kube-apiserver 注冊節點信息,內置的 cadvisor 統計和監控節點的資源使用情況。
為確保安全,部署時關閉了 kubelet 的非安全 http 端口,對請求進行認證和授權,拒絕未授權的訪問(如 apiserver、heapster 的請求)。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
下載和分發 kubelet 二進制文件
安裝依賴包
創建 kubelet bootstrap kubeconfig 文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
# 創建 token
export BOOTSTRAP_TOKEN=$(kubeadm token create \
--description kubelet-bootstrap-token \
--groups system:bootstrappers:${node_name} \
--kubeconfig ~/.kube/config)
# 設置集群參數
kubectl config set-cluster kubernetes \
--certificate-authority=/etc/kubernetes/cert/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
# 設置客戶端認證參數
kubectl config set-credentials kubelet-bootstrap \
--token=${BOOTSTRAP_TOKEN} \
--kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
# 設置上下文參數
kubectl config set-context default \
--cluster=kubernetes \
--user=kubelet-bootstrap \
--kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
# 設置默認上下文
kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
done
- 向 kubeconfig 寫入的是 token,bootstrap 結束后 kube-controller-manager 為 kubelet 創建 client 和 server 證書;
查看 kubeadm 為各節點創建的 token:
[root@kube-node1 work]# kubeadm token list --kubeconfig ~/.kube/config
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
83n69a.70n786zxgkhl1agc 23h 2019-11-08T20:52:48+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube-node1
99ljss.x7u9m04h01js5juo 23h 2019-11-08T20:52:48+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube-node2
9pfh4d.2on6eizmkzy3pgr1 23h 2019-11-08T20:52:48+08:00 authentication,signing kubelet-bootstrap-token system:bootstrappers:kube-node3
- token 有效期為 1 天,超期后將不能再被用來 boostrap kubelet,且會被 kube-controller-manager 的 tokencleaner 清理;
- kube-apiserver 接收 kubelet 的 bootstrap token 后,將請求的 user 設置為
system:bootstrap:<Token ID>
,group 設置為system:bootstrappers
,后續將為這個 group 設置 ClusterRoleBinding;
查看各 token 關聯的 Secret:
[root@kube-node1 work]# kubectl get secrets -n kube-system|grep bootstrap-token
bootstrap-token-83n69a bootstrap.kubernetes.io/token 7 63s
bootstrap-token-99ljss bootstrap.kubernetes.io/token 7 62s
bootstrap-token-9pfh4d bootstrap.kubernetes.io/token 7 62s
分發 bootstrap kubeconfig 文件到所有 worker 節點
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
scp kubelet-bootstrap-${node_name}.kubeconfig root@${node_name}:/etc/kubernetes/kubelet-bootstrap.kubeconfig
done
創建和分發 kubelet 參數配置文件
從 v1.10 開始,部分 kubelet 參數需在配置文件中配置,kubelet --help
會提示:
DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag
創建 kubelet 參數配置文件模板(可配置項參考代碼中注釋 ):
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kubelet-config.yaml.template <<EOF
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: "##NODE_IP##"
staticPodPath: ""
syncFrequency: 1m
fileCheckFrequency: 20s
httpCheckFrequency: 20s
staticPodURL: ""
port: 10250
readOnlyPort: 0
rotateCertificates: true
serverTLSBootstrap: true
authentication:
anonymous:
enabled: false
webhook:
enabled: true
x509:
clientCAFile: "/etc/kubernetes/cert/ca.pem"
authorization:
mode: Webhook
registryPullQPS: 0
registryBurst: 20
eventRecordQPS: 0
eventBurst: 20
enableDebuggingHandlers: true
enableContentionProfiling: true
healthzPort: 10248
healthzBindAddress: "##NODE_IP##"
clusterDomain: "${CLUSTER_DNS_DOMAIN}"
clusterDNS:
- "${CLUSTER_DNS_SVC_IP}"
nodeStatusUpdateFrequency: 10s
nodeStatusReportFrequency: 1m
imageMinimumGCAge: 2m
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
volumeStatsAggPeriod: 1m
kubeletCgroups: ""
systemCgroups: ""
cgroupRoot: ""
cgroupsPerQOS: true
cgroupDriver: cgroupfs
runtimeRequestTimeout: 10m
hairpinMode: promiscuous-bridge
maxPods: 220
podCIDR: "${CLUSTER_CIDR}"
podPidsLimit: -1
resolvConf: /etc/resolv.conf
maxOpenFiles: 1000000
kubeAPIQPS: 1000
kubeAPIBurst: 2000
serializeImagePulls: false
evictionHard:
memory.available: "100Mi"
nodefs.available: "10%"
nodefs.inodesFree: "5%"
imagefs.available: "15%"
evictionSoft: {}
enableControllerAttachDetach: true
failSwapOn: true
containerLogMaxSize: 20Mi
containerLogMaxFiles: 10
systemReserved: {}
kubeReserved: {}
systemReservedCgroup: ""
kubeReservedCgroup: ""
enforceNodeAllocatable: ["pods"]
EOF
- address:kubelet 安全端口(https,10250)監聽的地址,不能為 127.0.0.1,否則 kube-apiserver、heapster 等不能調用 kubelet 的 API;
- readOnlyPort=0:關閉只讀端口(默認 10255),等效為未指定;
- authentication.anonymous.enabled:設置為 false,不允許匿名�訪問 10250 端口;
- authentication.x509.clientCAFile:指定簽名客戶端證書的 CA 證書,開啟 HTTP 證書認證;
- authentication.webhook.enabled=true:開啟 HTTPs bearer token 認證;
- 對於未通過 x509 證書和 webhook 認證的請求(kube-apiserver 或其他客戶端),將被拒絕,提示 Unauthorized;
- authroization.mode=Webhook:kubelet 使用 SubjectAccessReview API 查詢 kube-apiserver 某 user、group 是否具有操作資源的權限(RBAC);
- featureGates.RotateKubeletClientCertificate、featureGates.RotateKubeletServerCertificate:自動 rotate 證書,證書的有效期取決於 kube-controller-manager 的 --experimental-cluster-signing-duration 參數;
- 需要 root 賬戶運行;
為各節點創建和分發 kubelet 配置文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
sed -e "s/##NODE_IP##/${node_ip}/" kubelet-config.yaml.template > kubelet-config-${node_ip}.yaml.template
scp kubelet-config-${node_ip}.yaml.template root@${node_ip}:/etc/kubernetes/kubelet-config.yaml
done
創建和分發 kubelet systemd unit 文件
創建 kubelet systemd unit 文件模板:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kubelet.service.template <<EOF
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service
[Service]
WorkingDirectory=${K8S_DIR}/kubelet
ExecStart=/opt/k8s/bin/kubelet \\
--allow-privileged=true \\
--bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \\
--cert-dir=/etc/kubernetes/cert \\
--cni-conf-dir=/etc/cni/net.d \\
--container-runtime=docker \\
--container-runtime-endpoint=unix:///var/run/dockershim.sock \\
--root-dir=${K8S_DIR}/kubelet \\
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\
--config=/etc/kubernetes/kubelet-config.yaml \\
--hostname-override=##NODE_NAME## \\
--pod-infra-container-image=registry.cn-beijing.aliyuncs.com/images_k8s/pause-amd64:3.1 \\
--image-pull-progress-deadline=15m \\
--volume-plugin-dir=${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/ \\
--logtostderr=true \\
--v=2
Restart=always
RestartSec=5
StartLimitInterval=0
[Install]
WantedBy=multi-user.target
EOF
- 如果設置了
--hostname-override
選項,則kube-proxy
也需要設置該選項,否則會出現找不到 Node 的情況; --bootstrap-kubeconfig
:指向 bootstrap kubeconfig 文件,kubelet 使用該文件中的用戶名和 token 向 kube-apiserver 發送 TLS Bootstrapping 請求;- K8S approve kubelet 的 csr 請求后,在
--cert-dir
目錄創建證書和私鑰文件,然后寫入--kubeconfig
文件; --pod-infra-container-image
不使用 redhat 的pod-infrastructure:latest
鏡像,它不能回收容器的僵屍;
為各節點創建和分發 kubelet systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
sed -e "s/##NODE_NAME##/${node_name}/" kubelet.service.template > kubelet-${node_name}.service
scp kubelet-${node_name}.service root@${node_name}:/etc/systemd/system/kubelet.service
done
Bootstrap Token Auth 和授予權限
kubelet 啟動時查找 --kubeletconfig
參數對應的文件是否存在,如果不存在則使用 --bootstrap-kubeconfig
指定的 kubeconfig 文件向 kube-apiserver 發送證書簽名請求 (CSR)。
kube-apiserver 收到 CSR 請求后,對其中的 Token 進行認證,認證通過后將請求的 user 設置為 system:bootstrap:<Token ID>
,group 設置為 system:bootstrappers
,這一過程稱為 Bootstrap Token Auth。
默認情況下,這個 user 和 group 沒有創建 CSR 的權限,kubelet 啟動失敗,錯誤日志如下:
$ sudo journalctl -u kubelet -a |grep -A 2 'certificatesigningrequests'
May 26 12:13:41 zhangjun-k8s01 kubelet[128468]: I0526 12:13:41.798230 128468 certificate_manager.go:366] Rotating certificates
May 26 12:13:41 zhangjun-k8s01 kubelet[128468]: E0526 12:13:41.801997 128468 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot cre
ate certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:82jfrm" cannot create resource "certificatesigningrequests" i
n API group "certificates.k8s.io" at the cluster scope
May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.044828 128468 kubelet.go:2244] node "zhangjun-k8s01" not found
May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.078658 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Unauthor
ized
May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.079873 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Unauthorize
d
May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.082683 128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Unau
thorized
May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.084473 128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unau
thorized
May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.088466 128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: U
nauthorized
解決辦法是:創建一個 clusterrolebinding,將 group system:bootstrappers 和 clusterrole system:node-bootstrapper 綁定:
$ kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers
啟動 kubelet 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/"
ssh root@${node_ip} "/usr/sbin/swapoff -a"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet"
done
- 啟動服務前必須先創建工作目錄;
- 關閉 swap 分區,否則 kubelet 會啟動失敗;
$ journalctl -u kubelet |tail
8月 15 12:16:49 zhangjun-k8s01 kubelet[7807]: I0815 12:16:49.578598 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
8月 15 12:16:49 zhangjun-k8s01 kubelet[7807]: I0815 12:16:49.578698 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.205871 7807 mount_linux.go:214] Detected OS with systemd
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.205939 7807 server.go:408] Version: v1.11.2
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206013 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206101 7807 feature_gate.go:230] feature gates: &{map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]}
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206217 7807 plugins.go:97] No cloud provider specified.
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206237 7807 server.go:524] No cloud provider specified: "" from the config file: ""
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206264 7807 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.208628 7807 bootstrap.go:86] No valid private key and/or certificate found, reusing existing private key or creating a new one
kubelet 啟動后使用 --bootstrap-kubeconfig 向 kube-apiserver 發送 CSR 請求,當這個 CSR 被 approve 后,kube-controller-manager 為 kubelet 創建 TLS 客戶端證書、私鑰和 --kubeletconfig 文件。
注意:kube-controller-manager 需要配置 --cluster-signing-cert-file
和 --cluster-signing-key-file
參數,才會為 TLS Bootstrap 創建證書和私鑰。
[root@kube-node1 work]# kubectl get csr
NAME AGE REQUESTOR CONDITION
csr-4stvn 67m system:bootstrap:9pfh4d Pending
csr-5dc4g 18m system:bootstrap:99ljss Pending
csr-5xbbr 18m system:bootstrap:9pfh4d Pending
csr-6599v 64m system:bootstrap:83n69a Pending
csr-7z2mv 3m34s system:bootstrap:9pfh4d Pending
csr-89fmf 3m35s system:bootstrap:99ljss Pending
csr-9kqzb 34m system:bootstrap:83n69a Pending
csr-c6chv 3m38s system:bootstrap:83n69a Pending
csr-cxk4d 49m system:bootstrap:83n69a Pending
csr-h7prh 49m system:bootstrap:9pfh4d Pending
csr-jh6hp 34m system:bootstrap:9pfh4d Pending
csr-jwv9x 64m system:bootstrap:99ljss Pending
csr-k8ss7 18m system:bootstrap:83n69a Pending
csr-nnwwm 49m system:bootstrap:99ljss Pending
csr-q87ps 67m system:bootstrap:99ljss Pending
csr-t4bb5 64m system:bootstrap:9pfh4d Pending
csr-wpjh5 34m system:bootstrap:99ljss Pending
csr-zmrbh 67m system:bootstrap:83n69a Pending
[root@kube-node1 work]# kubectl get nodes
No resources found.
- 三個 worker 節點的 csr 均處於 pending 狀態;
自動 approve CSR 請求
創建三個 ClusterRoleBinding,分別用於自動 approve client、renew client、renew server 證書:
cd /opt/k8s/work
cat > csr-crb.yaml <<EOF
# Approve all CSRs for the group "system:bootstrappers"
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: auto-approve-csrs-for-group
subjects:
- kind: Group
name: system:bootstrappers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
apiGroup: rbac.authorization.k8s.io
---
# To let a node of the group "system:nodes" renew its own credentials
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: node-client-cert-renewal
subjects:
- kind: Group
name: system:nodes
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
apiGroup: rbac.authorization.k8s.io
---
# A ClusterRole which instructs the CSR approver to approve a node requesting a
# serving cert matching its client cert.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: approve-node-server-renewal-csr
rules:
- apiGroups: ["certificates.k8s.io"]
resources: ["certificatesigningrequests/selfnodeserver"]
verbs: ["create"]
---
# To let a node of the group "system:nodes" renew its own server credentials
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: node-server-cert-renewal
subjects:
- kind: Group
name: system:nodes
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: approve-node-server-renewal-csr
apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f csr-crb.yaml
- auto-approve-csrs-for-group:自動 approve node 的第一次 CSR; 注意第一次 CSR 時,請求的 Group 為 system:bootstrappers;
- node-client-cert-renewal:自動 approve node 后續過期的 client 證書,自動生成的證書 Group 為 system:nodes;
- node-server-cert-renewal:自動 approve node 后續過期的 server 證書,自動生成的證書 Group 為 system:nodes;
查看 kubelet 的情況
等待一段時間(1-10 分鍾),三個節點的 CSR 都被自動 approved:
[root@kube-node1 work]# kubectl get csr
NAME AGE REQUESTOR CONDITION
csr-4stvn 70m system:bootstrap:9pfh4d Pending
csr-5dc4g 22m system:bootstrap:99ljss Pending
csr-5xbbr 22m system:bootstrap:9pfh4d Pending
csr-6599v 67m system:bootstrap:83n69a Pending
csr-7z2mv 7m22s system:bootstrap:9pfh4d Approved,Issued
csr-89fmf 7m23s system:bootstrap:99ljss Approved,Issued
csr-9kqzb 37m system:bootstrap:83n69a Pending
csr-c6chv 7m26s system:bootstrap:83n69a Approved,Issued
csr-cxk4d 52m system:bootstrap:83n69a Pending
csr-h7prh 52m system:bootstrap:9pfh4d Pending
csr-jfvv4 30s system:node:kube-node1 Pending
csr-jh6hp 37m system:bootstrap:9pfh4d Pending
csr-jwv9x 67m system:bootstrap:99ljss Pending
csr-k8ss7 22m system:bootstrap:83n69a Pending
csr-nnwwm 52m system:bootstrap:99ljss Pending
csr-q87ps 70m system:bootstrap:99ljss Pending
csr-t4bb5 67m system:bootstrap:9pfh4d Pending
csr-w2w2k 16s system:node:kube-node3 Pending
csr-wpjh5 37m system:bootstrap:99ljss Pending
csr-z5nww 23s system:node:kube-node2 Pending
csr-zmrbh 70m system:bootstrap:83n69a Pending
- Pending 的 CSR 用於創建 kubelet server 證書,需要手動 approve,參考后文。
所有節點均 ready:
[root@kube-node1 work]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-node1 Ready <none> 76s v1.14.2
kube-node2 Ready <none> 69s v1.14.2
kube-node3 Ready <none> 61s v1.14.2
kube-controller-manager 為各 node 生成了 kubeconfig 文件和公私鑰:
[root@kube-node1 work]# ls -l /etc/kubernetes/kubelet.kubeconfig
-rw------- 1 root root 2310 Nov 7 21:04 /etc/kubernetes/kubelet.kubeconfig
[root@kube-node1 work]# ls -l /etc/kubernetes/cert/|grep kubelet
-rw------- 1 root root 1277 Nov 7 22:11 kubelet-client-2019-11-07-22-11-52.pem
lrwxrwxrwx 1 root root 59 Nov 7 22:11 kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
- 沒有自動生成 kubelet server 證書;
手動 approve server cert csr
基於安全性考慮,CSR approving controllers 不會自動 approve kubelet server 證書簽名請求,需要手動 approve:
# 如下這個根據實際情況而定
# kubectl get csr
NAME AGE REQUESTOR CONDITION
csr-5f4vh 9m25s system:bootstrap:82jfrm Approved,Issued
csr-5r7j7 6m11s system:node:zhangjun-k8s03 Pending
csr-5rw7s 9m23s system:bootstrap:b1f7np Approved,Issued
csr-9snww 8m3s system:bootstrap:82jfrm Approved,Issued
csr-c7z56 6m12s system:node:zhangjun-k8s02 Pending
csr-j55lh 6m12s system:node:zhangjun-k8s01 Pending
csr-m29fm 9m25s system:bootstrap:3gzd53 Approved,Issued
csr-rc8w7 8m3s system:bootstrap:3gzd53 Approved,Issued
csr-vd52r 8m2s system:bootstrap:b1f7np Approved,Issued
# kubectl certificate approve csr-5r7j7
certificatesigningrequest.certificates.k8s.io/csr-5r7j7 approved
# kubectl certificate approve csr-c7z56
certificatesigningrequest.certificates.k8s.io/csr-c7z56 approved
# kubectl certificate approve csr-j55lh
certificatesigningrequest.certificates.k8s.io/csr-j55lh approved
[root@kube-node1 work]# ls -l /etc/kubernetes/cert/kubelet-*
-rw------- 1 root root 1277 Nov 7 22:11 /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
lrwxrwxrwx 1 root root 59 Nov 7 22:11 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
-rw------- 1 root root 1317 Nov 7 22:23 /etc/kubernetes/cert/kubelet-server-2019-11-07-22-23-05.pem
lrwxrwxrwx 1 root root 59 Nov 7 22:23 /etc/kubernetes/cert/kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2019-11-07-22-23-05.pem
kubelet 提供的 API 接口
kubelet 啟動后監聽多個端口,用於接收 kube-apiserver 或其它客戶端發送的請求:
[root@kube-node1 work]# netstat -lnpt|grep kubelet
tcp 0 0 127.0.0.1:38735 0.0.0.0:* LISTEN 24609/kubelet
tcp 0 0 192.168.75.110:10248 0.0.0.0:* LISTEN 24609/kubelet
tcp 0 0 192.168.75.110:10250 0.0.0.0:* LISTEN 24609/kubelet
- 10248: healthz http 服務;
- 10250: https 服務,訪問該端口時需要認證和授權(即使訪問 /healthz 也需要);
- 未開啟只讀端口 10255;
- 從 K8S v1.10 開始,去除了
--cadvisor-port
參數(默認 4194 端口),不支持訪問 cAdvisor UI & API。
例如執行 kubectl exec -it nginx-ds-5rmws -- sh
命令時,kube-apiserver 會向 kubelet 發送如下請求:
POST /exec/default/nginx-ds-5rmws/my-nginx?command=sh&input=1&output=1&tty=1
kubelet 接收 10250 端口的 https 請求,可以訪問如下資源:
- /pods、/runningpods
- /metrics、/metrics/cadvisor、/metrics/probes
- /spec
- /stats、/stats/container
- /logs
- /run/、/exec/, /attach/, /portForward/, /containerLogs/
詳情參考:https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L434:3
由於關閉了匿名認證,同時開啟了 webhook 授權,所有訪問 10250 端口 https API 的請求都需要被認證和授權。
預定義的 ClusterRole system:kubelet-api-admin 授予訪問 kubelet 所有 API 的權限(kube-apiserver 使用的 kubernetes 證書 User 授予了該權限):
[root@kube-node1 work]# kubectl describe clusterrole system:kubelet-api-admin
Name: system:kubelet-api-admin
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
nodes/log [] [] [*]
nodes/metrics [] [] [*]
nodes/proxy [] [] [*]
nodes/spec [] [] [*]
nodes/stats [] [] [*]
nodes [] [] [get list watch proxy]
kubelet api 認證和授權
kubelet 配置了如下認證參數:
- authentication.anonymous.enabled:設置為 false,不允許匿名�訪問 10250 端口;
- authentication.x509.clientCAFile:指定簽名客戶端證書的 CA 證書,開啟 HTTPs 證書認證;
- authentication.webhook.enabled=true:開啟 HTTPs bearer token 認證;
同時配置了如下授權參數:
- authroization.mode=Webhook:開啟 RBAC 授權;
kubelet 收到請求后,使用 clientCAFile 對證書簽名進行認證,或者查詢 bearer token 是否有效。如果兩者都沒通過,則拒絕請求,提示 Unauthorized:
[root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem https://192.168.75.110:10250/metrics
Unauthorized
[root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.75.110:10250/metrics
Unauthorized
通過認證后,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 發送請求,查詢證書或 token 對應的 user、group 是否有操作資源的權限(RBAC);
證書認證和授權
$ # 權限不足的證書;
[root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /etc/kubernetes/cert/kube-controller-manager.pem --key /etc/kubernetes/cert/kube-controller-manager-key.pem https://192.168.75.110:10250/metrics
Forbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics)
# 使用部署 kubectl 命令行工具時創建的、具有最高權限的 admin 證書
[root@kube-node1 work]# curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10250/metrics|head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
--cacert
、--cert
、--key
的參數值必須是文件路徑,如上面的./admin.pem
不能省略./
,否則返回401 Unauthorized
;
bear token 認證和授權
創建一個 ServiceAccount,將它和 ClusterRole system:kubelet-api-admin 綁定,從而具有調用 kubelet API 的權限:
kubectl create sa kubelet-api-test
kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-test
SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}')
TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}')
echo ${TOKEN}
[root@kube-node1 work]# curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.75.110:10250/metrics|head
# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
cadvisor 和 metrics
cadvisor 是內嵌在 kubelet 二進制中的,統計所在節點各容器的資源(CPU、內存、磁盤、網卡)使用情況的服務。
瀏覽器訪問 https://172.27.137.240:10250/metrics 和 https://172.27.137.240:10250/metrics/cadvisor 分別返回 kubelet 和 cadvisor 的 metrics。
注意:
- kubelet.config.json 設置 authentication.anonymous.enabled 為 false,不允許匿名證書訪問 10250 的 https 服務;
- 參考A.瀏覽器訪問kube-apiserver安全端口.md,創建和導入相關證書,然后訪問上面的 10250 端口;
獲取 kubelet 的配置
從 kube-apiserver 獲取各節點 kubelet 的配置:
$ # 使用部署 kubectl 命令行工具時創建的、具有最高權限的 admin 證書;
[root@kube-node1 work]# source /opt/k8s/bin/environment.sh
[root@kube-node1 work]# curl -sSL --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem ${KUBE_APISERVER}/api/v1/nodes/kube-node1/proxy/configz | jq \
> '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"'
{
"syncFrequency": "1m0s",
"fileCheckFrequency": "20s",
"httpCheckFrequency": "20s",
"address": "192.168.75.110",
"port": 10250,
"rotateCertificates": true,
"serverTLSBootstrap": true,
"authentication": {
"x509": {
"clientCAFile": "/etc/kubernetes/cert/ca.pem"
},
"webhook": {
"enabled": true,
"cacheTTL": "2m0s"
},
"anonymous": {
"enabled": false
}
},
"authorization": {
"mode": "Webhook",
"webhook": {
"cacheAuthorizedTTL": "5m0s",
"cacheUnauthorizedTTL": "30s"
}
},
"registryPullQPS": 0,
"registryBurst": 20,
"eventRecordQPS": 0,
"eventBurst": 20,
"enableDebuggingHandlers": true,
"enableContentionProfiling": true,
"healthzPort": 10248,
"healthzBindAddress": "192.168.75.110",
"oomScoreAdj": -999,
"clusterDomain": "cluster.local",
"clusterDNS": [
"10.254.0.2"
],
"streamingConnectionIdleTimeout": "4h0m0s",
"nodeStatusUpdateFrequency": "10s",
"nodeStatusReportFrequency": "1m0s",
"nodeLeaseDurationSeconds": 40,
"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,
"volumeStatsAggPeriod": "1m0s",
"cgroupsPerQOS": true,
"cgroupDriver": "cgroupfs",
"cpuManagerPolicy": "none",
"cpuManagerReconcilePeriod": "10s",
"runtimeRequestTimeout": "10m0s",
"hairpinMode": "promiscuous-bridge",
"maxPods": 220,
"podCIDR": "172.30.0.0/16",
"podPidsLimit": -1,
"resolvConf": "/etc/resolv.conf",
"cpuCFSQuota": true,
"cpuCFSQuotaPeriod": "100ms",
"maxOpenFiles": 1000000,
"contentType": "application/vnd.kubernetes.protobuf",
"kubeAPIQPS": 1000,
"kubeAPIBurst": 2000,
"serializeImagePulls": false,
"evictionHard": {
"memory.available": "100Mi"
},
"evictionPressureTransitionPeriod": "5m0s",
"enableControllerAttachDetach": true,
"makeIPTablesUtilChains": true,
"iptablesMasqueradeBit": 14,
"iptablesDropBit": 15,
"failSwapOn": true,
"containerLogMaxSize": "20Mi",
"containerLogMaxFiles": 10,
"configMapAndSecretChangeDetectionStrategy": "Watch",
"enforceNodeAllocatable": [
"pods"
],
"kind": "KubeletConfiguration",
"apiVersion": "kubelet.config.k8s.io/v1beta1"
}
或者參考代碼中的注釋。
參考
- kubelet 認證和授權:https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-authentication-authorization/
07-3.部署 kube-proxy 組件
kube-proxy 運行在所有 worker 節點上,它監聽 apiserver 中 service 和 endpoint 的變化情況,創建路由規則以提供服務 IP 和負載均衡功能。
本文檔講解使用 ipvs 模式的 kube-proxy 的部署過程。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
下載和分發 kube-proxy 二進制文件
安裝依賴包
各節點需要安裝 ipvsadm
和 ipset
命令,加載 ip_vs
內核模塊。
創建 kube-proxy 證書
創建證書簽名請求:
cd /opt/k8s/work
cat > kube-proxy-csr.json <<EOF
{
"CN": "system:kube-proxy",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
- CN:指定該證書的 User 為
system:kube-proxy
; - 預定義的 RoleBinding
system:node-proxier
將Usersystem:kube-proxy
與 Rolesystem:node-proxier
綁定,該 Role 授予了調用kube-apiserver
Proxy 相關 API 的權限; - 該證書只會被 kube-proxy 當做 client 證書使用,所以 hosts 字段為空;
生成證書和私鑰:
cd /opt/k8s/work
cfssl gencert -ca=/opt/k8s/work/ca.pem \
-ca-key=/opt/k8s/work/ca-key.pem \
-config=/opt/k8s/work/ca-config.json \
-profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
ls kube-proxy*
創建和分發 kubeconfig 文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
--certificate-authority=/opt/k8s/work/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-credentials kube-proxy \
--client-certificate=kube-proxy.pem \
--client-key=kube-proxy-key.pem \
--embed-certs=true \
--kubeconfig=kube-proxy.kubeconfig
kubectl config set-context default \
--cluster=kubernetes \
--user=kube-proxy \
--kubeconfig=kube-proxy.kubeconfig
kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
--embed-certs=true
:將 ca.pem 和 admin.pem 證書內容嵌入到生成的 kubectl-proxy.kubeconfig 文件中(不加時,寫入的是證書文件路徑);
分發 kubeconfig 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
scp kube-proxy.kubeconfig root@${node_name}:/etc/kubernetes/
done
創建 kube-proxy 配置文件
從 v1.10 開始,kube-proxy 部分參數可以配置文件中配置。可以使用 --write-config-to
選項生成該配置文件,或者參考 源代碼的注釋。
創建 kube-proxy config 文件模板:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-proxy-config.yaml.template <<EOF
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
burst: 200
kubeconfig: "/etc/kubernetes/kube-proxy.kubeconfig"
qps: 100
bindAddress: ##NODE_IP##
healthzBindAddress: ##NODE_IP##:10256
metricsBindAddress: ##NODE_IP##:10249
enableProfiling: true
clusterCIDR: ${CLUSTER_CIDR}
hostnameOverride: ##NODE_NAME##
mode: "ipvs"
portRange: ""
kubeProxyIPTablesConfiguration:
masqueradeAll: false
kubeProxyIPVSConfiguration:
scheduler: rr
excludeCIDRs: []
EOF
bindAddress
: 監聽地址;clientConnection.kubeconfig
: 連接 apiserver 的 kubeconfig 文件;clusterCIDR
: kube-proxy 根據--cluster-cidr
判斷集群內部和外部流量,指定--cluster-cidr
或--masquerade-all
選項后 kube-proxy 才會對訪問 Service IP 的請求做 SNAT;hostnameOverride
: 參數值必須與 kubelet 的值一致,否則 kube-proxy 啟動后會找不到該 Node,從而不會創建任何 ipvs 規則;mode
: 使用 ipvs 模式;
為各節點創建和分發 kube-proxy 配置文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
do
echo ">>> ${NODE_NAMES[i]}"
sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-proxy-config.yaml.template > kube-proxy-config-${NODE_NAMES[i]}.yaml.template
scp kube-proxy-config-${NODE_NAMES[i]}.yaml.template root@${NODE_NAMES[i]}:/etc/kubernetes/kube-proxy-config.yaml
done
創建和分發 kube-proxy systemd unit 文件
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-proxy.service <<EOF
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target
[Service]
WorkingDirectory=${K8S_DIR}/kube-proxy
ExecStart=/opt/k8s/bin/kube-proxy \\
--config=/etc/kubernetes/kube-proxy-config.yaml \\
--logtostderr=true \\
--v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
分發 kube-proxy systemd unit 文件:
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
do
echo ">>> ${node_name}"
scp kube-proxy.service root@${node_name}:/etc/systemd/system/
done
啟動 kube-proxy 服務
cd /opt/k8s/works
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-proxy"
ssh root@${node_ip} "modprobe ip_vs_rr"
ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy"
done
- 啟動服務前必須先創建工作目錄;
檢查啟動結果
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "systemctl status kube-proxy|grep Active"
done
確保狀態為 active (running)
,否則查看日志,確認原因:
journalctl -u kube-proxy
查看監聽端口
[root@kube-node1 work]# netstat -lnpt|grep kube-proxy
tcp 0 0 192.168.75.110:10249 0.0.0.0:* LISTEN 6648/kube-proxy
tcp 0 0 192.168.75.110:10256 0.0.0.0:* LISTEN 6648/kube-proxy
- 10249:http prometheus metrics port;
- 10256:http healthz port;
查看 ipvs 路由規則
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh root@${node_ip} "/usr/sbin/ipvsadm -ln"
done
預期輸出:
>>> 192.168.75.110
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.254.0.1:443 rr
-> 192.168.75.110:6443 Masq 1 0 0
-> 192.168.75.111:6443 Masq 1 0 0
-> 192.168.75.112:6443 Masq 1 0 0
>>> 192.168.75.111
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.254.0.1:443 rr
-> 192.168.75.110:6443 Masq 1 0 0
-> 192.168.75.111:6443 Masq 1 0 0
-> 192.168.75.112:6443 Masq 1 0 0
>>> 192.168.75.112
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.254.0.1:443 rr
-> 192.168.75.110:6443 Masq 1 0 0
-> 192.168.75.111:6443 Masq 1 0 0
-> 192.168.75.112:6443 Masq 1 0 0
可見所有通過 https 訪問 K8S SVC kubernetes 的請求都轉發到 kube-apiserver 節點的 6443 端口;
08.驗證集群功能
本文檔使用 daemonset 驗證 master 和 worker 節點是否工作正常。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行,然后遠程分發文件和執行命令。
檢查節點狀態
[root@kube-node1 work]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-node1 Ready <none> 16h v1.14.2
kube-node2 Ready <none> 16h v1.14.2
kube-node3 Ready <none> 16h v1.14.2
都為 Ready 時正常。
創建測試文件
cd /opt/k8s/work
cat > nginx-ds.yml <<EOF
apiVersion: v1
kind: Service
metadata:
name: nginx-ds
labels:
app: nginx-ds
spec:
type: NodePort
selector:
app: nginx-ds
ports:
- name: http
port: 80
targetPort: 80
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nginx-ds
labels:
addonmanager.kubernetes.io/mode: Reconcile
spec:
template:
metadata:
labels:
app: nginx-ds
spec:
containers:
- name: my-nginx
image: nginx:1.7.9
ports:
- containerPort: 80
EOF
執行測試
[root@kube-node1 work]# kubectl create -f nginx-ds.yml
service/nginx-ds created
daemonset.extensions/nginx-ds created
檢查各節點的 Pod IP 連通性
在這中間有一個逐步創建並啟動的過程
[root@kube-node1 work]# kubectl get pods -o wide|grep nginx-ds
nginx-ds-7z464 0/1 ContainerCreating 0 22s <none> kube-node2 <none> <none>
nginx-ds-hz5fd 0/1 ContainerCreating 0 22s <none> kube-node1 <none> <none>
nginx-ds-skcrt 0/1 ContainerCreating 0 22s <none> kube-node3 <none> <none>
[root@kube-node1 work]# kubectl get pods -o wide|grep nginx-ds
nginx-ds-7z464 0/1 ContainerCreating 0 34s <none> kube-node2 <none> <none>
nginx-ds-hz5fd 0/1 ContainerCreating 0 34s <none> kube-node1 <none> <none>
nginx-ds-skcrt 1/1 Running 0 34s 172.30.200.2 kube-node3 <none> <none>
[root@kube-node1 work]# kubectl get pods -o wide|grep nginx-ds
nginx-ds-7z464 1/1 Running 0 70s 172.30.40.2 kube-node2 <none> <none>
nginx-ds-hz5fd 1/1 Running 0 70s 172.30.24.2 kube-node1 <none> <none>
nginx-ds-skcrt 1/1 Running 0 70s 172.30.200.2 kube-node3 <none> <none>
可見,nginx-ds 的 Pod IP 分別是 172.30.40.2
、172.30.24.2
、172.30.200.2
,在所有 Node 上分別 ping 這三個 IP,看是否連通:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "ping -c 1 172.30.24.2"
ssh ${node_ip} "ping -c 1 172.30.40.2"
ssh ${node_ip} "ping -c 1 172.30.200.2"
done
檢查服務 IP 和端口可達性
[root@kube-node1 work]# kubectl get svc |grep nginx-ds
nginx-ds NodePort 10.254.94.213 <none> 80:32039/TCP 3m24s
可見:
- Service Cluster IP:10.254.94.213
- 服務端口:80
- NodePort 端口:32039
在所有 Node 上 curl Service IP:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "curl -s 10.254.94.213"
done
預期輸出 nginx 歡迎頁面內容。
檢查服務的 NodePort 可達性
在所有 Node 上執行:
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
do
echo ">>> ${node_ip}"
ssh ${node_ip} "curl -s ${node_ip}:32039"
done
預期輸出 nginx 歡迎頁面內容。
09-0.部署集群插件
插件是集群的附件組件,豐富和完善了集群的功能。
注意:
- kuberntes 自帶插件的 manifests yaml 文件使用 gcr.io 的 docker registry,國內被牆,需要手動替換為其它 registry 地址(本文檔未替換);
09-1.部署 coredns 插件
注意:
- 如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行;
- kuberntes 自帶插件的 manifests yaml 文件使用 gcr.io 的 docker registry,國內被牆,需要手動替換為其它 registry 地址(本文檔未替換);
- 可以從微軟中國提供的 gcr.io 免費代理下載被牆的鏡像;
修改配置文件
將下載的 kubernetes-server-linux-amd64.tar.gz 解壓后,再解壓其中的 kubernetes-src.tar.gz 文件。
cd /opt/k8s/work/kubernetes/
tar -xzvf kubernetes-src.tar.gz
coredns 目錄是 cluster/addons/dns
:
cd /opt/k8s/work/kubernetes/cluster/addons/dns/coredns
cp coredns.yaml.base coredns.yaml
source /opt/k8s/bin/environment.sh
sed -i -e "s/__PILLAR__DNS__DOMAIN__/${CLUSTER_DNS_DOMAIN}/" -e "s/__PILLAR__DNS__SERVER__/${CLUSTER_DNS_SVC_IP}/" coredns.yaml
### 注意 ###
在文件coredns.yaml中,拉取的coredns鏡像是k8s.gcr.io/coredns:1.3.1,但是網站k8s.gcr.io被牆,無法訪問,所以需要使用文檔中提供的地址更換鏡像下載地址:
地址:http://mirror.azure.cn/help/gcr-proxy-cache.html
文檔中需要修改的地方:
將image: k8s.gcr.io/coredns:1.3.1 換成 image: gcr.azk8s.cn/google_containers/coredns:1.3.1 此時才能拉取鏡像,避免后面因鏡像無法拉取而導致的容器啟動錯誤
創建 coredns
kubectl create -f coredns.yaml
# 注意
若在上一步中忘記修改鏡像地址,造成coredns無法成功運行,可以使用如下命令先刪除操作,然后修改上述步驟提到的修改鏡像地址,然后再創建
檢查 coredns 功能
[root@kube-node1 coredns]# kubectl get all -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-58c479c699-blpdq 1/1 Running 0 4m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP,9153/TCP 4m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 1/1 1 1 4m
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-58c479c699 1 1 1 4m
# 注意:pod/coredns狀態應該是Running才行,否則后面的步驟都無法驗證
新建一個 Deployment
cd /opt/k8s/work
cat > my-nginx.yaml <<EOF
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-nginx
spec:
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx:1.7.9
ports:
- containerPort: 80
EOF
kubectl create -f my-nginx.yaml
export 該 Deployment, 生成 my-nginx
服務:
[root@kube-node1 work]# kubectl expose deploy my-nginx
service/my-nginx exposed
[root@kube-node1 work]# kubectl get services --all-namespaces |grep my-nginx
default my-nginx ClusterIP 10.254.63.243 <none> 80/TCP 11s
創建另一個 Pod,查看 /etc/resolv.conf
是否包含 kubelet
配置的 --cluster-dns
和 --cluster-domain
,是否能夠將服務 my-nginx
解析到上面顯示的 Cluster IP 10.254.242.255
cd /opt/k8s/work
cat > dnsutils-ds.yml <<EOF
apiVersion: v1
kind: Service
metadata:
name: dnsutils-ds
labels:
app: dnsutils-ds
spec:
type: NodePort
selector:
app: dnsutils-ds
ports:
- name: http
port: 80
targetPort: 80
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: dnsutils-ds
labels:
addonmanager.kubernetes.io/mode: Reconcile
spec:
template:
metadata:
labels:
app: dnsutils-ds
spec:
containers:
- name: my-dnsutils
image: tutum/dnsutils:latest
command:
- sleep
- "3600"
ports:
- containerPort: 80
EOF
kubectl create -f dnsutils-ds.yml
[root@kube-node1 work]# kubectl get pods -lapp=dnsutils-ds
NAME READY STATUS RESTARTS AGE
dnsutils-ds-5krtg 1/1 Running 0 64s
dnsutils-ds-cxzlg 1/1 Running 0 64s
dnsutils-ds-tln64 1/1 Running 0 64s
[root@kube-node1 work]# kubectl -it exec dnsutils-ds-5krtg bash
root@dnsutils-ds-5krtg:/# cat /etc/resolv.conf
nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local mshome.net
options ndots:5
注意:若下面這些步驟均無法驗證,則很大可能是coredns鏡像拉取不到,此時可以通過如下命令查看具體原因:
kubectl get pod -n kube-system # 查看coredns
kubectl describe pods -n kube-system coredns名稱全稱 # 查看具體描述信息
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kubernetes
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kubernetes.default.svc.cluster.local
Address: 10.254.0.1
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup www.baidu.com
Server: 10.254.0.2
Address: 10.254.0.2#53
Non-authoritative answer:
Name: www.baidu.com.mshome.net
Address: 218.28.144.36
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup my-nginx
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: my-nginx.default.svc.cluster.local
Address: 10.254.63.243
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster
Server: 10.254.0.2
Address: 10.254.0.2#53
Non-authoritative answer:
Name: kube-dns.kube-system.svc.cluster.mshome.net
Address: 218.28.144.37
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kube-dns.kube-system.svc.cluster.local
Address: 10.254.0.2
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster.local
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kube-dns.kube-system.svc.cluster.local
Address: 10.254.0.2
[root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster.local.
Server: 10.254.0.2
Address: 10.254.0.2#53
Name: kube-dns.kube-system.svc.cluster.local
Address: 10.254.0.2
參考
- https://community.infoblox.com/t5/Community-Blog/CoreDNS-for-Kubernetes-Service-Discovery/ba-p/8187
- https://coredns.io/2017/03/01/coredns-for-kubernetes-service-discovery-take-2/
- https://www.cnblogs.com/boshen-hzb/p/7511432.html
- https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns
09-2.部署 dashboard 插件
注意:
- 如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行;
- kuberntes 自帶插件的 manifests yaml 文件使用 gcr.io 的 docker registry,國內被牆,需要手動替換為其它 registry 地址(本文檔未替換);
- 可以從微軟中國提供的 gcr.io 免費代理下載被牆的鏡像;
修改配置文件
將下載的 kubernetes-server-linux-amd64.tar.gz 解壓后,再解壓其中的 kubernetes-src.tar.gz 文件。
cd /opt/k8s/work/kubernetes/
tar -xzvf kubernetes-src.tar.gz
dashboard 對應的目錄是:cluster/addons/dashboard
:
cd /opt/k8s/work/kubernetes/cluster/addons/dashboard
修改 service 定義,指定端口類型為 NodePort,這樣外界可以通過地址 NodeIP:NodePort 訪問 dashboard;
# cat dashboard-service.yaml
apiVersion: v1
kind: Service
metadata:
name: kubernetes-dashboard
namespace: kube-system
labels:
k8s-app: kubernetes-dashboard
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
type: NodePort # 增加這一行
selector:
k8s-app: kubernetes-dashboard
ports:
- port: 443
targetPort: 8443
執行所有定義文件
# ls *.yaml
dashboard-configmap.yaml dashboard-controller.yaml dashboard-rbac.yaml dashboard-secret.yaml dashboard-service.yaml
# 注意,需要修改其中鏡像地址的文件
dashboard-controller.yaml
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1修改成image: gcr.azk8s.cn/google_containers/kubernetes-dashboard-amd64:v1.10.1
# kubectl apply -f .
查看分配的 NodePort
[root@kube-node1 dashboard]# kubectl get deployment kubernetes-dashboard -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
kubernetes-dashboard 1/1 1 1 14s
[root@kube-node1 dashboard]# kubectl --namespace kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-58c479c699-blpdq 1/1 Running 0 30m 172.30.200.4 kube-node3 <none> <none>
kubernetes-dashboard-64ffdff795-5rgd2 1/1 Running 0 33s 172.30.24.3 kube-node1 <none> <none> <none>
[root@kube-node1 dashboard]# kubectl get services kubernetes-dashboard -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard NodePort 10.254.110.235 <none> 443:31673/TCP 47s
- NodePort 31673映射到 dashboard pod 443 端口;
查看 dashboard 支持的命令行參數
# kubernetes-dashboard-64ffdff795-5rgd2 是pod名稱
[root@kube-node1 dashboard]# kubectl exec --namespace kube-system -it kubernetes-dashboard-64ffdff795-5rgd2 -- /dashboard --help
2019/11/08 07:55:04 Starting overwatch
Usage of /dashboard:
--alsologtostderr log to standard error as well as files
--api-log-level string Level of API request logging. Should be one of 'INFO|NONE|DEBUG'. Default: 'INFO'. (default "INFO")
--apiserver-host string The address of the Kubernetes Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8080. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and local discovery is attempted.
--authentication-mode strings Enables authentication options that will be reflected on login screen. Supported values: token, basic. Default: token.Note that basic option should only be used if apiserver has '--authorization-mode=ABAC' and '--basic-auth-file' flags set. (default [token])
--auto-generate-certificates When set to true, Dashboard will automatically generate certificates used to serve HTTPS. Default: false.
--bind-address ip The IP address on which to serve the --secure-port (set to 0.0.0.0 for all interfaces). (default 0.0.0.0)
--default-cert-dir string Directory path containing '--tls-cert-file' and '--tls-key-file' files. Used also when auto-generating certificates flag is set. (default "/certs")
--disable-settings-authorizer When enabled, Dashboard settings page will not require user to be logged in and authorized to access settings page.
--enable-insecure-login When enabled, Dashboard login view will also be shown when Dashboard is not served over HTTPS. Default: false.
--enable-skip-login When enabled, the skip button on the login page will be shown. Default: false.
--heapster-host string The address of the Heapster Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8082. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and service proxy will be used.
--insecure-bind-address ip The IP address on which to serve the --port (set to 0.0.0.0 for all interfaces). (default 127.0.0.1)
--insecure-port int The port to listen to for incoming HTTP requests. (default 9090)
--kubeconfig string Path to kubeconfig file with authorization and master location information.
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--logtostderr log to standard error instead of files
--metric-client-check-period int Time in seconds that defines how often configured metric client health check should be run. Default: 30 seconds. (default 30)
--port int The secure port to listen to for incoming HTTPS requests. (default 8443)
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
--system-banner string When non-empty displays message to Dashboard users. Accepts simple HTML tags. Default: ''.
--system-banner-severity string Severity of system banner. Should be one of 'INFO|WARNING|ERROR'. Default: 'INFO'. (default "INFO")
--tls-cert-file string File containing the default x509 Certificate for HTTPS.
--tls-key-file string File containing the default x509 private key matching --tls-cert-file.
--token-ttl int Expiration time (in seconds) of JWE tokens generated by dashboard. Default: 15 min. 0 - never expires (default 900)
-v, --v Level log level for V logs
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
pflag: help requested
command terminated with exit code 2
dashboard 的 --authentication-mode
支持 token、basic,默認為 token。如果使用 basic,則 kube-apiserver 必須配置 --authorization-mode=ABAC
和 --basic-auth-file
參數。
訪問 dashboard
從 1.7 開始,dashboard 只允許通過 https 訪問,如果使用 kube proxy 則必須監聽 localhost 或 127.0.0.1。對於 NodePort 沒有這個限制,但是僅建議在開發環境中使用。
對於不滿足這些條件的登錄訪問,在登錄成功后瀏覽器不跳轉,始終停在登錄界面。
- kubernetes-dashboard 服務暴露了 NodePort,可以使用
https://NodeIP:NodePort
地址訪問 dashboard; - 通過 kube-apiserver 訪問 dashboard;
- 通過 kubectl proxy 訪問 dashboard:
通過 kubectl proxy 訪問 dashboard
這一步不操作
啟動代理:
$ kubectl proxy --address='localhost' --port=8086 --accept-hosts='^*$'
Starting to serve on 127.0.0.1:8086
- --address 必須為 localhost 或 127.0.0.1;
- 需要指定
--accept-hosts
選項,否則瀏覽器訪問 dashboard 頁面時提示 “Unauthorized”;
瀏覽器訪問 URL:http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
通過 kube-apiserver 訪問 dashboard
使用這種方式訪問
獲取集群服務地址列表:
[root@kube-node1 work]# kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:8443
CoreDNS is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
- 由於 apiserver 通過本地的 kube-nginx 做了代理,所以上面顯示的 127.0.0.1:8443 為本地的 kube-nginx 的 IP 和 Port,瀏覽器訪問時需要替換為 kube-apiserver 實際監聽的 IP 和端口,如 192.168.75.110:6443;
- 必須通過 kube-apiserver 的安全端口(https)訪問 dashbaord,訪問時瀏覽器需要使用自定義證書,否則會被 kube-apiserver 拒絕訪問。
- 創建和導入自定義證書的步驟,參考:A.瀏覽器訪問kube-apiserver安全端口
瀏覽器訪問 URL:https://192.168.75.110:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
創建登錄 Dashboard 的 token 和 kubeconfig 配置文件
dashboard 默認只支持 token 認證(不支持 client 證書認證),所以如果使用 Kubeconfig 文件,需要將 token 寫入到該文件。
創建登錄 token
kubectl create sa dashboard-admin -n kube-system
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
ADMIN_SECRET=$(kubectl get secrets -n kube-system | grep dashboard-admin | awk '{print $1}')
DASHBOARD_LOGIN_TOKEN=$(kubectl describe secret -n kube-system ${ADMIN_SECRET} | grep -E '^token' | awk '{print $2}')
echo ${DASHBOARD_LOGIN_TOKEN}
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tZnpjbWwiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiY2FmYzk3MDctMDFmZi0xMWVhLThlOTctMDAwYzI5MWQxODIwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.YdK7a1YSUa-Y4boHDM2qLrI5PrimxIUd3EfuCX7GiiDVZ3EvJZQFA4_InGWcbHdZoA8AYyh2pQn-hGhiVz0lU2jLIIIFEF2zHc5su1CSISRciONv6NMrFBlTr6tNFsf6SEeEep9tvGILAFTHXPqSVsIb_lCmHeBdH_CDo4sAyLFATDYqI5Q2jBxnCU7DsD73j3LvLY9WlgpuLwAhOrNHc6USxPvB91-z-4GGbcpGIQPpDQ6OlT3cAP47zFRBIpIc2JwBZ63EmcZJqLxixgPMROqzFvV9mtx68o_GEAccsIELMEMqq9USIXibuFtQT6mV0U3p_wntIhr4OPxe5b7jvQ
使用輸出的 token 登錄 Dashboard。
在瀏覽器登陸界面選擇使用令牌
創建使用 token 的 KubeConfig 文件
source /opt/k8s/bin/environment.sh
# 設置集群參數
kubectl config set-cluster kubernetes \
--certificate-authority=/etc/kubernetes/cert/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=dashboard.kubeconfig
# 設置客戶端認證參數,使用上面創建的 Token
kubectl config set-credentials dashboard_user \
--token=${DASHBOARD_LOGIN_TOKEN} \ # 注意這個參數,若使用shell腳本,有可能獲取不到這個值,可以在shell腳本中手動設置這個值
--kubeconfig=dashboard.kubeconfig
# 設置上下文參數
kubectl config set-context default \
--cluster=kubernetes \
--user=dashboard_user \
--kubeconfig=dashboard.kubeconfig
# 設置默認上下文
kubectl config use-context default --kubeconfig=dashboard.kubeconfig
由於缺少 Heapster 插件,當前 dashboard 不能展示 Pod、Nodes 的 CPU、內存等統計數據和圖表。
參考
- https://github.com/kubernetes/dashboard/wiki/Access-control
- https://github.com/kubernetes/dashboard/issues/2558
- https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/
- https://github.com/kubernetes/dashboard/wiki/Accessing-Dashboard---1.7.X-and-above
- https://github.com/kubernetes/dashboard/issues/2540
09-3.部署 metrics-server 插件
注意:
- 如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行;
- kuberntes 自帶插件的 manifests yaml 文件使用 gcr.io 的 docker registry,國內被牆,需要手動替換為其它 registry 地址(本文檔未替換);
- 可以從微軟中國提供的 gcr.io 免費代理下載被牆的鏡像;
metrics-server 通過 kube-apiserver 發現所有節點,然后調用 kubelet APIs(通過 https 接口)獲得各節點(Node)和 Pod 的 CPU、Memory 等資源使用情況。
從 Kubernetes 1.12 開始,kubernetes 的安裝腳本移除了 Heapster,從 1.13 開始完全移除了對 Heapster 的支持,Heapster 不再被維護。
替代方案如下:
- 用於支持自動擴縮容的 CPU/memory HPA metrics:metrics-server;
- 通用的監控方案:使用第三方可以獲取 Prometheus 格式監控指標的監控系統,如 Prometheus Operator;
- 事件傳輸:使用第三方工具來傳輸、歸檔 kubernetes events;
Kubernetes Dashboard 還不支持 metrics-server(PR:#3504),如果使用 metrics-server 替代 Heapster,將無法在 dashboard 中以圖形展示 Pod 的內存和 CPU 情況,需要通過 Prometheus、Grafana 等監控方案來彌補。
注意:如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行。
監控架構
安裝 metrics-server
從 github clone 源碼:
$ cd /opt/k8s/work/
$ git clone https://github.com/kubernetes-incubator/metrics-server.git
$ cd metrics-server/deploy/1.8+/
$ ls
aggregated-metrics-reader.yaml auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml
修改 metrics-server-deployment.yaml
文件,為 metrics-server 添加三個命令行參數:
# cat metrics-server-deployment.yaml
34 args:
35 - --cert-dir=/tmp
36 - --secure-port=4443
37 - --metric-resolution=30s # 新增
38 - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP # 新增
同時還需要修改鏡像的拉取地址:
把image: k8s.gcr.io/metrics-server-amd64:v0.3.6換成image: gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6
- --metric-resolution=30s:從 kubelet 采集數據的周期;
- --kubelet-preferred-address-types:優先使用 InternalIP 來訪問 kubelet,這樣可以避免節點名稱沒有 DNS 解析記錄時,通過節點名稱調用節點 kubelet API 失敗的情況(未配置時默認的情況);
部署 metrics-server:
# cd /opt/k8s/work/metrics-server/deploy/1.8+/
# kubectl create -f .
查看運行情況
[root@kube-node1 1.8+]# kubectl -n kube-system get pods -l k8s-app=metrics-server
NAME READY STATUS RESTARTS AGE
metrics-server-65879bf98c-ghqbk 1/1 Running 0 38s
[root@kube-node1 1.8+]# kubectl get svc -n kube-system metrics-server
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
metrics-server ClusterIP 10.254.244.235 <none> 443/TCP 55s
metrics-server 的命令行參數
# docker run -it --rm gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6 --help
Launch metrics-server
Usage:
[flags]
Flags:
--alsologtostderr log to standard error as well as files
--authentication-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to create tokenaccessreviews.authentication.k8s.io.
--authentication-skip-lookup If false, the authentication-kubeconfig will be used to lookup missing authentication configuration from the clust
er.
--authentication-token-webhook-cache-ttl duration The duration to cache responses from the webhook token authenticator. (default 10s)
--authentication-tolerate-lookup-failure If true, failures to look up missing authentication configuration from the cluster are not considered fatal. Note that this can result in authentication that treats all requests as anonymous.
--authorization-always-allow-paths strings A list of HTTP paths to skip during authorization, i.e. these are authorized without contacting the 'core' kubernetes server.
--authorization-kubeconfig string kubeconfig file pointing at the 'core' kubernetes server with enough rights to create subjectaccessreviews.authorization.k8s.io.
--authorization-webhook-cache-authorized-ttl duration The duration to cache 'authorized' responses from the webhook authorizer. (default 10s)
--authorization-webhook-cache-unauthorized-ttl duration The duration to cache 'unauthorized' responses from the webhook authorizer. (default 10s)
--bind-address ip The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the
rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces). (default 0.0.0.0)
--cert-dir string The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "apiserver.local.config/certificates")
--client-ca-file string If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate.
--contention-profiling Enable lock contention profiling, if profiling is enabled
-h, --help help for this command
--http2-max-streams-per-connection int The limit that the server gives to clients for the maximum number of streams in an HTTP/2 connection. Zero means t
o use golang's default.
--kubeconfig string The path to the kubeconfig used to connect to the Kubernetes API server and the Kubelets (defaults to in-cluster c
onfig)
--kubelet-certificate-authority string Path to the CA to use to validate the Kubelet's serving certificates.
--kubelet-insecure-tls Do not verify CA of serving certificates presented by Kubelets. For testing purposes only.
--kubelet-port int The port to use to connect to Kubelets. (default 10250)
--kubelet-preferred-address-types strings The priority of node address types to use when determining which address to use to connect to a particular node (d
efault [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
--log-flush-frequency duration Maximum number of seconds between log flushes (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_file string If non-empty, use this log file
--logtostderr log to standard error instead of files (default true)
--metric-resolution duration The resolution at which metrics-server will retain metrics. (default 1m0s)
--profiling Enable profiling via web interface host:port/debug/pprof/ (default true)
--requestheader-allowed-names strings List of client certificate common names to allow to provide usernames in headers specified by --requestheader-user
name-headers. If empty, any client certificate validated by the authorities in --requestheader-client-ca-file is allowed.
--requestheader-client-ca-file string Root certificate bundle to use to verify client certificates on incoming requests before trusting usernames in hea
ders specified by --requestheader-username-headers. WARNING: generally do not depend on authorization being already done for incoming requests.
--requestheader-extra-headers-prefix strings List of request header prefixes to inspect. X-Remote-Extra- is suggested. (default [x-remote-extra-])
--requestheader-group-headers strings List of request headers to inspect for groups. X-Remote-Group is suggested. (default [x-remote-group])
--requestheader-username-headers strings List of request headers to inspect for usernames. X-Remote-User is common. (default [x-remote-user])
--secure-port int The port on which to serve HTTPS with authentication and authorization.If 0, don't serve HTTPS at all. (default 443)
--skip_headers If true, avoid header prefixes in the log messages
--stderrthreshold severity logs at or above this threshold go to stderr
--tls-cert-file string File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir.
--tls-cipher-suites strings Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be use. Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA
--tls-min-version string Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12
--tls-private-key-file string File containing the default x509 private key matching --tls-cert-file.
--tls-sni-cert-key namedCertKey A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default [])
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
查看 metrics-server 輸出的 metrics
-
通過 kube-apiserver 或 kubectl proxy 訪問:
使用瀏覽器訪問,直接返回結果
https://192.168.75.110:6443/apis/metrics.k8s.io/v1beta1/nodes
https://192.168.75.110:6443/apis/metrics.k8s.io/v1beta1/pods
-
直接使用 kubectl 命令訪問:
kubectl get --raw apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw apis/metrics.k8s.io/v1beta1/pods
# kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq .
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "metrics.k8s.io/v1beta1",
"resources": [
{
"name": "nodes",
"singularName": "",
"namespaced": false,
"kind": "NodeMetrics",
"verbs": [
"get",
"list"
]
},
{
"name": "pods",
"singularName": "",
"namespaced": true,
"kind": "PodMetrics",
"verbs": [
"get",
"list"
]
}
]
}
# kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .
{
"kind": "NodeMetricsList",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
},
"items": [
{
"metadata": {
"name": "zhangjun-k8s01",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s01",
"creationTimestamp": "2019-05-26T10:55:10Z"
},
"timestamp": "2019-05-26T10:54:52Z",
"window": "30s",
"usage": {
"cpu": "311155148n",
"memory": "2881016Ki"
}
},
{
"metadata": {
"name": "zhangjun-k8s02",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s02",
"creationTimestamp": "2019-05-26T10:55:10Z"
},
"timestamp": "2019-05-26T10:54:54Z",
"window": "30s",
"usage": {
"cpu": "253796835n",
"memory": "1028836Ki"
}
},
{
"metadata": {
"name": "zhangjun-k8s03",
"selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s03",
"creationTimestamp": "2019-05-26T10:55:10Z"
},
"timestamp": "2019-05-26T10:54:54Z",
"window": "30s",
"usage": {
"cpu": "280441339n",
"memory": "1072772Ki"
}
}
]
}
- /apis/metrics.k8s.io/v1beta1/nodes 和 /apis/metrics.k8s.io/v1beta1/pods 返回的 usage 包含 CPU 和 Memory;
使用 kubectl top 命令查看集群節點資源使用情況
kubectl top 命令從 metrics-server 獲取集群節點基本的指標信息:
[root@kube-node1 1.8+]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kube-node1 125m 3% 833Mi 44%
kube-node2 166m 4% 891Mi 47%
kube-node3 126m 3% 770Mi 40%
參考
- https://kubernetes.feisky.xyz/zh/addons/metrics.html
- metrics-server RBAC:https://github.com/kubernetes-incubator/metrics-server/issues/40
- metrics-server 參數:https://github.com/kubernetes-incubator/metrics-server/issues/25
- https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/
- metrics-server 的 APIs 文檔。
09-4.部署 EFK 插件
注意:
- 如果沒有特殊指明,本文檔的所有操作均在 kube-node1 節點上執行。
- kuberntes 自帶插件的 manifests yaml 文件使用 gcr.io 的 docker registry,國內被牆,需要手動替換為其它 registry 地址;
- 可以從微軟中國提供的 gcr.io 免費代理下載被牆的鏡像;
修改配置文件
將下載的 kubernetes-server-linux-amd64.tar.gz 解壓后,再解壓其中的 kubernetes-src.tar.gz 文件。
cd /opt/k8s/work/kubernetes/
tar -xzvf kubernetes-src.tar.gz
EFK 目錄是 kubernetes/cluster/addons/fluentd-elasticsearch
。
# cd /opt/k8s/work/kubernetes/cluster/addons/fluentd-elasticsearch
# vim fluentd-es-ds.yaml
把path: /var/lib/docker/containers修改成 path: /data/k8s/docker/data/containers/
把image: k8s.gcr.io/fluentd-elasticsearch:v2.4.0修改成 image: gcr.azk8s.cn/google_containers/fluentd-elasticsearch:v2.4.0
# vim es-statefulset.yaml
官方文檔中容器名稱和鏡像寫的有問題,需要修改成如下形式
serviceAccountName: elasticsearch-logging
containers:
- name: elasticsearch-logging
#image: gcr.io/fluentd-elasticsearch/elasticsearch:v6.6.1
image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1
#gcr.azk8s.cn/fluentd-elasticsearch/elasticsearch:v6.6.1
執行定義文件
[root@kube-node1 fluentd-elasticsearch]# pwd
/opt/k8s/work/kubernetes/cluster/addons/fluentd-elasticsearch
[root@kube-node1 fluentd-elasticsearch]# ls *.yaml
es-service.yaml es-statefulset.yaml fluentd-es-configmap.yaml fluentd-es-ds.yaml kibana-deployment.yaml kibana-service.yaml
# kubectl apply -f .
檢查執行結果
# 理想狀態下的結果
[root@kube-node1 fluentd-elasticsearch]# kubectl get pods -n kube-system -o wide|grep -E 'elasticsearch|fluentd|kibana'
elasticsearch-logging-0 1/1 Running 1 92s 172.30.24.6 kube-node1 <none> <none>
elasticsearch-logging-1 1/1 Running 1 85s 172.30.40.6 kube-node2 <none> <none>
fluentd-es-v2.4.0-k72m9 1/1 Running 0 91s 172.30.200.7 kube-node3 <none> <none>
fluentd-es-v2.4.0-klvbr 1/1 Running 0 91s 172.30.24.7 kube-node1 <none> <none>
fluentd-es-v2.4.0-pcq8p 1/1 Running 0 91s 172.30.40.5 kube-node2 <none> <none>
kibana-logging-f4d99b69f-779gm 1/1 Running 0 91s 172.30.200.6 kube-node3 <none> <none>
# 不理想狀態下的結果
# 兩個elasticsearch-logging只有一個是正常的,三個fluentd-es中有倆是正常的
# 不過過一會兒有問題的也會出現running正常情況,沒問題的出現問題
# 初步判斷是因為出問題所在主機系統平均負載壓力大的緣故
[root@kube-node1 fluentd-elasticsearch]# kubectl get pods -n kube-system -o wide|grep -E 'elasticsearch|fluentd|kibana'
elasticsearch-logging-0 1/1 Running 0 16m 172.30.48.3 kube-node2 <none> <none>
elasticsearch-logging-1 0/1 CrashLoopBackOff 7 15m 172.30.24.6 kube-node1 <none> <none>
fluentd-es-v2.4.0-lzcl7 1/1 Running 0 16m 172.30.96.3 kube-node3 <none> <none>
fluentd-es-v2.4.0-mm6gs 0/1 CrashLoopBackOff 5 16m 172.30.48.4 kube-node2 <none> <none>
fluentd-es-v2.4.0-vx5vj 1/1 Running 0 16m 172.30.24.3 kube-node1 <none> <none>
kibana-logging-f4d99b69f-6kjlr 1/1 Running 0 16m 172.30.96.5 kube-node3 <none> <none>
[root@kube-node1 fluentd-elasticsearch]# kubectl get service -n kube-system|grep -E 'elasticsearch|kibana'
elasticsearch-logging ClusterIP 10.254.202.87 <none> 9200/TCP 116s
kibana-logging ClusterIP 10.254.185.3 <none> 5601/TCP 114s
kibana Pod 第一次啟動時會用較長時間(0-20分鍾)來優化和 Cache 狀態頁面,可以 tailf 該 Pod 的日志觀察進度:
$ kubectl logs kibana-logging-7445dc9757-pvpcv -n kube-system -f
{"type":"log","@timestamp":"2019-05-26T11:36:18Z","tags":["info","optimize"],"pid":1,"message":"Optimizing and caching bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page. This may take a few minutes"}
{"type":"log","@timestamp":"2019-05-26T11:40:03Z","tags":["info","optimize"],"pid":1,"message":"Optimization of bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page complete in 224.57 seconds"}
注意:只有當 Kibana pod 啟動完成后,瀏覽器才能查看 kibana dashboard,否則會被拒絕。
訪問 kibana
- 通過 kube-apiserver 訪問:
操作這個步驟
```bash
[root@kube-node1 fluentd-elasticsearch]# kubectl cluster-info|grep -E 'Elasticsearch|Kibana'
Elasticsearch is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy
Kibana is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
```
瀏覽器訪問 URL: https://192.168.75.111:6443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
對於 virtuabox 做了端口映射: http://127.0.0.1:8080/api/v1/namespaces/kube-system/services/kibana-logging/proxy
-
通過 kubectl proxy 訪問
不操作這個步驟
創建代理
$ kubectl proxy --address='172.27.137.240' --port=8086 --accept-hosts='^*$' Starting to serve on 172.27.129.150:8086
瀏覽器訪問 URL:
http://172.27.137.240:8086/api/v1/namespaces/kube-system/services/kibana-logging/proxy
對於 virtuabox 做了端口映射:
http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/kibana-logging/proxy
在 Management -> Indices 頁面創建一個 index(相當於 mysql 中的一個 database),選中 Index contains time-based events
,使用默認的 logstash-*
pattern,點擊 Create
(這一步對操作所在節點的系統平均負載壓力很大) ; 創建 Index 后,稍等幾分鍾就可以在 Discover
菜單下看到 ElasticSearch logging 中匯聚的日志;
系統平均負載壓力大,表現是kswapd0進程占用CPU過高,深層含義是主機物理內存不足
10.部署私有 docker registry
注意:這一步不操作,私有倉庫采用Harbor來部署
注意:本文檔介紹使用 docker 官方的 registry v2 鏡像部署私有倉庫的步驟,你也可以部署 Harbor 私有倉庫(部署 Harbor 私有倉庫)。
本文檔講解部署一個 TLS 加密、HTTP Basic 認證、用 ceph rgw 做后端存儲的私有 docker registry 步驟,如果使用其它類型的后端存儲,則可以從 “創建 docker registry” 節開始;
示例兩台機器 IP 如下:
- ceph rgw: 172.27.132.66
- docker registry: 172.27.132.67
部署 ceph RGW 節點
$ ceph-deploy rgw create 172.27.132.66 # rgw 默認監聽7480端口
$
創建測試賬號 demo
$ radosgw-admin user create --uid=demo --display-name="ceph rgw demo user"
$
創建 demo 賬號的子賬號 swift
當前 registry 只支持使用 swift 協議訪問 ceph rgw 存儲,暫時不支持 s3 協議;
$ radosgw-admin subuser create --uid demo --subuser=demo:swift --access=full --secret=secretkey --key-type=swift
$
創建 demo:swift 子賬號的 sercret key
$ radosgw-admin key create --subuser=demo:swift --key-type=swift --gen-secret
{
"user_id": "demo",
"display_name": "ceph rgw demo user",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [
{
"id": "demo:swift",
"permissions": "full-control"
}
],
"keys": [
{
"user": "demo",
"access_key": "5Y1B1SIJ2YHKEHO5U36B",
"secret_key": "nrIvtPqUj7pUlccLYPuR3ntVzIa50DToIpe7xFjT"
}
],
"swift_keys": [
{
"user": "demo:swift",
"secret_key": "ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb"
}
],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}
ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb
為子賬號 demo:swift 的 secret key;
創建 docker registry
創建 registry 使用的 x509 證書
$ mkdir -p registry/{auth,certs}
$ cat > registry-csr.json <<EOF
{
"CN": "registry",
"hosts": [
"127.0.0.1",
"172.27.132.67"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
$ cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
-ca-key=/etc/kubernetes/cert/ca-key.pem \
-config=/etc/kubernetes/cert/ca-config.json \
-profile=kubernetes registry-csr.json | cfssljson -bare registry
$ cp registry.pem registry-key.pem registry/certs
$
- 這里復用以前創建的 CA 證書和秘鑰文件;
- hosts 字段指定 registry 的 NodeIP;
創建 HTTP Baisc 認證文件
$ docker run --entrypoint htpasswd registry:2 -Bbn foo foo123 > registry/auth/htpasswd
$ cat registry/auth/htpasswd
foo:$2y$05$iZaM45Jxlcg0DJKXZMggLOibAsHLGybyU.CgU9AHqWcVDyBjiScN.
配置 registry 參數
export RGW_AUTH_URL="http://172.27.132.66:7480/auth/v1"
export RGW_USER="demo:swift"
export RGW_SECRET_KEY="ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb"
cat > config.yml << EOF
# https://docs.docker.com/registry/configuration/#list-of-configuration-options
version: 0.1
log:
level: info
fromatter: text
fields:
service: registry
storage:
cache:
blobdescriptor: inmemory
delete:
enabled: true
swift:
authurl: ${RGW_AUTH_URL}
username: ${RGW_USER}
password: ${RGW_SECRET_KEY}
container: registry
auth:
htpasswd:
realm: basic-realm
path: /auth/htpasswd
http:
addr: 0.0.0.0:8000
headers:
X-Content-Type-Options: [nosniff]
tls:
certificate: /certs/registry.pem
key: /certs/registry-key.pem
health:
storagedriver:
enabled: true
interval: 10s
threshold: 3
EOF
[k8s@zhangjun-k8s01 cert]$ cp config.yml registry
[k8s@zhangjun-k8s01 cert]$ scp -r registry 172.27.132.67:/opt/k8s
- storage.swift 指定后端使用 swfit 接口協議的存儲,這里配置的是 ceph rgw 存儲參數;
- auth.htpasswd 指定了 HTTP Basic 認證的 token 文件路徑;
- http.tls 指定了 registry http 服務器的證書和秘鑰文件路徑;
創建 docker registry:
ssh k8s@172.27.132.67
$ docker run -d -p 8000:8000 --privileged \
-v /opt/k8s/registry/auth/:/auth \
-v /opt/k8s/registry/certs:/certs \
-v /opt/k8s/registry/config.yml:/etc/docker/registry/config.yml \
--name registry registry:2
- 執行該 docker run 命令的機器 IP 為 172.27.132.67;
向 registry push image
將簽署 registry 證書的 CA 證書拷貝到 /etc/docker/certs.d/172.27.132.67:8000
目錄下
[k8s@zhangjun-k8s01 cert]$ sudo mkdir -p /etc/docker/certs.d/172.27.132.67:8000
[k8s@zhangjun-k8s01 cert]$ sudo cp /etc/kubernetes/cert/ca.pem /etc/docker/certs.d/172.27.132.67:8000/ca.crt
登陸私有 registry:
$ docker login 172.27.132.67:8000
Username: foo
Password:
Login Succeeded
登陸信息被寫入 ~/.docker/config.json
文件:
$ cat ~/.docker/config.json
{
"auths": {
"172.27.132.67:8000": {
"auth": "Zm9vOmZvbzEyMw=="
}
}
}
將本地的 image 打上私有 registry 的 tag:
$ docker tag prom/node-exporter:v0.16.0 172.27.132.67:8000/prom/node-exporter:v0.16.0
$ docker images |grep pause
prom/node-exporter:v0.16.0 latest f9d5de079539 2 years ago 239.8 kB
172.27.132.67:8000/prom/node-exporter:v0.16.0 latest f9d5de079539 2 years ago 239.8 kB
將 image push 到私有 registry:
$ docker push 172.27.132.67:8000/prom/node-exporter:v0.16.0
The push refers to a repository [172.27.132.67:8000/prom/node-exporter:v0.16.0]
5f70bf18a086: Pushed
e16a89738269: Pushed
latest: digest: sha256:9a6b437e896acad3f5a2a8084625fdd4177b2e7124ee943af642259f2f283359 size: 916
查看 ceph 上是否已經有 push 的 pause 容器文件:
$ rados lspools
rbd
cephfs_data
cephfs_metadata
.rgw.root
k8s
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.data
$ rados --pool default.rgw.buckets.data ls|grep node-exporter
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/cdb7590af5f064887f3d6008d46be65e929c74250d747813d85199e04fc70463/link
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/revisions/sha256/55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086/link
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/tags/v0.16.0/current/link
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/tags/v0.16.0/index/sha256/55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086/link
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/224a21997e8ca8514d42eb2ed98b19a7ee2537bce0b3a26b8dff510ab637f15c/link
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/528dda9cf23d0fad80347749d6d06229b9a19903e49b7177d5f4f58736538d4e/link
1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/188af75e2de0203eac7c6e982feff45f9c340eaac4c7a0f59129712524fa2984/link
私有 registry 的運維操作
查詢私有鏡像中的 images
$ curl --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/_catalog
{"repositories":["prom/node-exporter"]}
查詢某個鏡像的 tags 列表
$ curl --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/tags/list
{"name":"prom/node-exporter","tags":["v0.16.0"]}
獲取 image 或 layer 的 digest
向 v2/<repoName>/manifests/<tagName>
發 GET 請求,從響應的頭部 Docker-Content-Digest
獲取 image digest,從響應的 body 的 fsLayers.blobSum
中獲取 layDigests;
注意,必須包含請求頭:Accept: application/vnd.docker.distribution.manifest.v2+json
:
$ curl -v -H "Accept: application/vnd.docker.distribution.manifest.v2+json" --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/manifests/v0.16.0
* About to connect() to 172.27.132.67 port 8000 (#0)
* Trying 172.27.132.67...
* Connected to 172.27.132.67 (172.27.132.67) port 8000 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/docker/certs.d/172.27.132.67:8000/ca.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=registry,OU=4Paradigm,O=k8s,L=BeiJing,ST=BeiJing,C=CN
* start date: Jul 05 12:52:00 2018 GMT
* expire date: Jul 02 12:52:00 2028 GMT
* common name: registry
* issuer: CN=kubernetes,OU=4Paradigm,O=k8s,L=BeiJing,ST=BeiJing,C=CN
* Server auth using Basic with user 'foo'
> GET /v2/prom/node-exporter/manifests/v0.16.0 HTTP/1.1
> Authorization: Basic Zm9vOmZvbzEyMw==
> User-Agent: curl/7.29.0
> Host: 172.27.132.67:8000
> Accept: application/vnd.docker.distribution.manifest.v2+json
>
< HTTP/1.1 200 OK
< Content-Length: 949
< Content-Type: application/vnd.docker.distribution.manifest.v2+json
< Docker-Content-Digest: sha256:55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086
< Docker-Distribution-Api-Version: registry/2.0
< Etag: "sha256:55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086"
< X-Content-Type-Options: nosniff
< Date: Fri, 06 Jul 2018 06:18:41 GMT
<
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 3511,
"digest": "sha256:188af75e2de0203eac7c6e982feff45f9c340eaac4c7a0f59129712524fa2984"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 2392417,
"digest": "sha256:224a21997e8ca8514d42eb2ed98b19a7ee2537bce0b3a26b8dff510ab637f15c"
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 560703,
"digest": "sha256:cdb7590af5f064887f3d6008d46be65e929c74250d747813d85199e04fc70463"
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 5332460,
"digest": "sha256:528dda9cf23d0fad80347749d6d06229b9a19903e49b7177d5f4f58736538d4e"
}
]
刪除 image
向 /v2/<name>/manifests/<reference>
發送 DELETE 請求,reference 為上一步返回的 Docker-Content-Digest 字段內容:
$ curl -X DELETE --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/manifests/sha256:68effe31a4ae8312e47f54bec52d1fc925908009ce7e6f734e1b54a4169081c5
$
刪除 layer
向 /v2/<name>/blobs/<digest>
發送 DELETE 請求,其中 digest 是上一步返回的 fsLayers.blobSum
字段內容:
$ curl -X DELETE --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
$ curl -X DELETE --cacert /etc/docker/certs.d/172.27.132.67\:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/blobs/sha256:04176c8b224aa0eb9942af765f66dae866f436e75acef028fe44b8a98e045515
$
常見問題
login 失敗 416
執行 http://docs.ceph.com/docs/master/install/install-ceph-gateway/ 里面的 s3 test.py 程序失敗:
[k8s@zhangjun-k8s01 cert]$ python s3test.py Traceback (most recent call last): File "s3test.py", line 12, in bucket = conn.create_bucket('my-new-bucket') File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 625, in create_bucket response.status, response.reason, body) boto.exception.S3ResponseError: S3ResponseError: 416 Requested Range Not Satisfiable
解決版辦法:
- 在管理節點上修改 ceph.conf
- ceph-deploy config push zhangjun-k8s01 zhangjun-k8s02 zhangjun-k8s03
- systemctl restart 'ceph-mds@zhangjun-k8s03.service' systemctl restart ceph-osd@0 systemctl restart 'ceph-mon@zhangjun-k8s01.service' systemctl restart 'ceph-mgr@zhangjun-k8s01.service'
For anyone who is hitting this issue set default pg_num and pgp_num to lower value(8 for example), or set mon_max_pg_per_osd to a high value in ceph.conf radosgw-admin doesn' throw proper error when internal pool creation fails, hence the upper level error which is very confusing.
https://tracker.ceph.com/issues/21497
login 失敗 503
[root@zhangjun-k8s01 ~]# docker login 172.27.132.67:8000 Username: foo Password: Error response from daemon: login attempt to https://172.27.132.67:8000/v2/ failed with status: 503 Service Unavailable
原因: docker run 缺少 --privileged 參數
11.部署 harbor 私有倉庫
本文檔介紹使用 docker-compose 部署 harbor 私有倉庫的步驟,你也可以使用 docker 官方的 registry 鏡像部署私有倉庫(部署 Docker Registry)。
使用的變量
本文檔用到的變量定義如下:
# 這個環境變量后面會用到,但是搞不清楚這個IP到底是從哪兒來的???
export NODE_IP=10.64.3.7 # 當前部署 harbor 的節點 IP
下載文件
從 docker compose 發布頁面下載最新的 docker-compose
二進制文件
cd /opt/k88/work
wget https://github.com/docker/compose/releases/download/1.21.2/docker-compose-Linux-x86_64
mv docker-compose-Linux-x86_64 /opt/k8s/bin/docker-compose
chmod a+x /opt/k8s/bin/docker-compose
export PATH=/opt/k8s/bin:$PATH
從 harbor 發布頁面下載最新的 harbor 離線安裝包
cd /opt/k88/work
wget --continue https://storage.googleapis.com/harbor-releases/release-1.5.0/harbor-offline-installer-v1.5.1.tgz
tar -xzvf harbor-offline-installer-v1.5.1.tgz
導入 docker images
導入離線安裝包中 harbor 相關的 docker images:
cd harbor
docker load -i harbor.v1.5.1.tar.gz
創建 harbor nginx 服務器使用的 x509 證書
創建 harbor 證書簽名請求:
cd /opt/k8s/work
cat > harbor-csr.json <<EOF
{
"CN": "harbor",
"hosts": [
"127.0.0.1",
"${NODE_IP}" ### 前面未設置環境變量的話可以直接寫死
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BeiJing",
"O": "k8s",
"OU": "4Paradigm"
}
]
}
EOF
- hosts 字段指定授權使用該證書的當前部署節點 IP,如果后續使用域名訪問 harbor 則還需要添加域名;
生成 harbor 證書和私鑰:
cd /opt/k8s/work
cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
-ca-key=/etc/kubernetes/cert/ca-key.pem \
-config=/etc/kubernetes/cert/ca-config.json \
-profile=kubernetes harbor-csr.json | cfssljson -bare harbor
ls harbor*
harbor.csr harbor-csr.json harbor-key.pem harbor.pem
mkdir -p /etc/harbor/ssl
cp harbor*.pem /etc/harbor/ssl
修改 harbor.cfg 文件
cd /opt/k8s/work/harbor
cp harbor.cfg{,.bak} # 備份配置文件
vim harbor.cfg
hostname = 172.27.129.81
ui_url_protocol = https
ssl_cert = /etc/harbor/ssl/harbor.pem
ssl_cert_key = /etc/harbor/ssl/harbor-key.pem
cp prepare{,.bak}
vim prepare
把empty_subj = "/C=/ST=/L=/O=/CN=/" 修改成 empty_subj = "/"
-
需要修改 prepare 腳本的 empyt_subj 參數,否則后續 install 時出錯退出:
Fail to generate key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt
參考:https://github.com/vmware/harbor/issues/2920
加載和啟動 harbor 鏡像
cd /opt/k8s/work/harbor
mkdir -p /data # 用來存放日志相關的 后期可以考慮修改到其他路徑下
./install.sh
[Step 0]: checking installation environment ...
Note: docker version: 18.03.0
Note: docker-compose version: 1.21.2
[Step 1]: loading Harbor images ...
Loaded image: vmware/clair-photon:v2.0.1-v1.5.1
Loaded image: vmware/postgresql-photon:v1.5.1
Loaded image: vmware/harbor-adminserver:v1.5.1
Loaded image: vmware/registry-photon:v2.6.2-v1.5.1
Loaded image: vmware/photon:1.0
Loaded image: vmware/harbor-migrator:v1.5.1
Loaded image: vmware/harbor-ui:v1.5.1
Loaded image: vmware/redis-photon:v1.5.1
Loaded image: vmware/nginx-photon:v1.5.1
Loaded image: vmware/mariadb-photon:v1.5.1
Loaded image: vmware/notary-signer-photon:v0.5.1-v1.5.1
Loaded image: vmware/harbor-log:v1.5.1
Loaded image: vmware/harbor-db:v1.5.1
Loaded image: vmware/harbor-jobservice:v1.5.1
Loaded image: vmware/notary-server-photon:v0.5.1-v1.5.1
[Step 2]: preparing environment ...
loaded secret from file: /data/secretkey
Generated configuration file: ./common/config/nginx/nginx.conf
Generated configuration file: ./common/config/adminserver/env
Generated configuration file: ./common/config/ui/env
Generated configuration file: ./common/config/registry/config.yml
Generated configuration file: ./common/config/db/env
Generated configuration file: ./common/config/jobservice/env
Generated configuration file: ./common/config/jobservice/config.yml
Generated configuration file: ./common/config/log/logrotate.conf
Generated configuration file: ./common/config/jobservice/config.yml
Generated configuration file: ./common/config/ui/app.conf
Generated certificate, key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt
The configuration files are ready, please use docker-compose to start the service.
[Step 3]: checking existing instance of Harbor ...
[Step 4]: starting Harbor ...
Creating network "harbor_harbor" with the default driver
Creating harbor-log ... done
Creating redis ... done
Creating harbor-adminserver ... done
Creating harbor-db ... done
Creating registry ... done
Creating harbor-ui ... done
Creating harbor-jobservice ... done
Creating nginx ... done
✔ ----Harbor has been installed and started successfully.----
Now you should be able to visit the admin portal at https://192.168.75.110.
For more details, please visit https://github.com/vmware/harbor .
訪問管理界面
確認所有組件都工作正常:
[root@kube-node1 harbor]# docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------------------------------------------------------------------
harbor-adminserver /harbor/start.sh Up (healthy)
harbor-db /usr/local/bin/docker-entr ... Up (healthy) 3306/tcp
harbor-jobservice /harbor/start.sh Up
harbor-log /bin/sh -c /usr/local/bin/ ... Up (healthy) 127.0.0.1:1514->10514/tcp
harbor-ui /harbor/start.sh Up (healthy)
nginx nginx -g daemon off; Up (healthy) 0.0.0.0:443->443/tcp, 0.0.0.0:4443->4443/tcp, 0.0.0.0:80->80/tcp
redis docker-entrypoint.sh redis ... Up 6379/tcp
registry /entrypoint.sh serve /etc/ ... Up (healthy) 5000/tcp
瀏覽器訪問 https://192.168.75.110
;
用賬號 admin
和 harbor.cfg 配置文件中的默認密碼 Harbor12345
登陸系統。
harbor 運行時產生的文件、目錄
harbor 將日志打印到 /var/log/harbor 的相關目錄下,使用 docker logs XXX 或 docker-compose logs XXX 將看不到容器的日志。
# 日志目錄
ls /var/log/harbor
adminserver.log jobservice.log mysql.log proxy.log registry.log ui.log
# 數據目錄,包括數據庫、鏡像倉庫
ls /data/
ca_download config database job_logs registry secretkey
修改默認的數據目錄等
# 修改"secretkey"的路徑
vim harbor.cfg
#The path of secretkey storage
secretkey_path = /data/harbor-data # 默認是 /data
# 修改原先所有默認為"/data"的volume的掛載路徑
vim docker-compose.yml
# 完成上述修改后執行下述命令重新部署容器即可:
./prepare
docker-compose up -d
# 注意:在整個部署過程中,不要手動修改上述關聯掛載路徑下的內容。若要修改相關內容,一定要保證在容器完全移除(docker-compose down)的前提下進行。
docker 客戶端登陸
將簽署harbor 證書的 CA 證書拷貝到客戶端的指定目錄下 ,假設Harbor倉庫部署在主機IP是192.168.75.110的主機上,主機IP是192.168.75.111的想要遠程的登陸該倉庫。
# 在主機IP是192.168.75.111上創建指定目錄用來存放倉庫鏡像。注意后面的IP地址,倉庫地址是ip則用ip,是網址的話則用網址
mkdir -p /etc/docker/certs.d/192.168.75.110
# 在主機ip是192.168.75.110上操作,把CA證書拷貝到客戶端的指定目錄下,也就是上一步創建的目錄下,並重命名為ca.crt
scp /etc/kubernetes/cert/ca.pem root@192.168.75.111:/etc/docker/certs.d/192.168.75.110/ca.crt
登陸 harbor
# docker login https://192.168.75.110
Username: admin
Password: Harbor12345 # 默認密碼
認證信息自動保存到 ~/.docker/config.json
文件。
其它操作
下列操作的工作目錄均為 解壓離線安裝文件后 生成的 harbor 目錄。
# 修改倉庫鏡像保存路徑,日志文件保存路徑等會用到這些,可以參考上面的步驟:修改默認的數據目錄等
# 停止 harbor
docker-compose down -v
# 修改配置
vim harbor.cfg
# 更修改的配置更新到 docker-compose.yml 文件
./prepare
Clearing the configuration file: ./common/config/ui/app.conf
Clearing the configuration file: ./common/config/ui/env
Clearing the configuration file: ./common/config/ui/private_key.pem
Clearing the configuration file: ./common/config/db/env
Clearing the configuration file: ./common/config/registry/root.crt
Clearing the configuration file: ./common/config/registry/config.yml
Clearing the configuration file: ./common/config/jobservice/app.conf
Clearing the configuration file: ./common/config/jobservice/env
Clearing the configuration file: ./common/config/nginx/cert/admin.pem
Clearing the configuration file: ./common/config/nginx/cert/admin-key.pem
Clearing the configuration file: ./common/config/nginx/nginx.conf
Clearing the configuration file: ./common/config/adminserver/env
loaded secret from file: /data/secretkey
Generated configuration file: ./common/config/nginx/nginx.conf
Generated configuration file: ./common/config/adminserver/env
Generated configuration file: ./common/config/ui/env
Generated configuration file: ./common/config/registry/config.yml
Generated configuration file: ./common/config/db/env
Generated configuration file: ./common/config/jobservice/env
Generated configuration file: ./common/config/jobservice/app.conf
Generated configuration file: ./common/config/ui/app.conf
Generated certificate, key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt
The configuration files are ready, please use docker-compose to start the service.
chmod -R 666 common ## 防止容器進程沒有權限讀取生成的配置
# 啟動 harbor
docker-compose up -d
12.清理集群
清理 Node 節點
停相關進程:
systemctl stop kubelet kube-proxy flanneld docker kube-proxy kube-nginx
清理文件:
source /opt/k8s/bin/environment.sh
# umount kubelet 和 docker 掛載的目錄
mount | grep "${K8S_DIR}" | awk '{print $3}'|xargs sudo umount
# 刪除 kubelet 工作目錄
rm -rf ${K8S_DIR}/kubelet
# 刪除 docker 工作目錄
rm -rf ${DOCKER_DIR}
# 刪除 flanneld 寫入的網絡配置文件
rm -rf /var/run/flannel/
# 刪除 docker 的一些運行文件
rm -rf /var/run/docker/
# 刪除 systemd unit 文件
rm -rf /etc/systemd/system/{kubelet,docker,flanneld,kube-nginx}.service
# 刪除程序文件
rm -rf /opt/k8s/bin/*
# 刪除證書文件
rm -rf /etc/flanneld/cert /etc/kubernetes/cert
清理 kube-proxy 和 docker 創建的 iptables:
iptables -F && sudo iptables -X && sudo iptables -F -t nat && sudo iptables -X -t nat
刪除 flanneld 和 docker 創建的網橋:
ip link del flannel.1
ip link del docker0
清理 Master 節點
停相關進程:
systemctl stop kube-apiserver kube-controller-manager kube-scheduler kube-nginx
清理文件:
# 刪除 systemd unit 文件
rm -rf /etc/systemd/system/{kube-apiserver,kube-controller-manager,kube-scheduler,kube-nginx}.service
# 刪除程序文件
rm -rf /opt/k8s/bin/{kube-apiserver,kube-controller-manager,kube-scheduler}
# 刪除證書文件
rm -rf /etc/flanneld/cert /etc/kubernetes/cert
清理 etcd 集群
停相關進程:
systemctl stop etcd
清理文件:
source /opt/k8s/bin/environment.sh
# 刪除 etcd 的工作目錄和數據目錄
rm -rf ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}
# 刪除 systemd unit 文件
rm -rf /etc/systemd/system/etcd.service
# 刪除程序文件
rm -rf /opt/k8s/bin/etcd
# 刪除 x509 證書文件
rm -rf /etc/etcd/cert/*
A.瀏覽器訪問 kube-apiserver 安全端口
瀏覽器訪問 kube-apiserver 的安全端口 6443 時,提示證書不被信任:
這是因為 kube-apiserver 的 server 證書是我們創建的根證書 ca.pem 簽名的,需要將根證書 ca.pem 導入操作系統,並設置永久信任。
對於 Mac,操作如下:
對於 windows 系統使用以下命令導入 ca.perm:
keytool -import -v -trustcacerts -alias appmanagement -file "PATH...\\ca.pem" -storepass password -keystore cacerts
再次訪問 apiserver 地址,已信任,但提示 401,未授權的訪問:
注意:從這個地方開始進行操作
我們需要給瀏覽器生成一個 client 證書,訪問 apiserver 的 6443 https 端口時使用。
這里使用部署 kubectl 命令行工具時創建的 admin 證書、私鑰和上面的 ca 證書,創建一個瀏覽器可以使用 PKCS#12/PFX 格式的證書:
$ openssl pkcs12 -export -out admin.pfx -inkey admin-key.pem -in admin.pem -certfile ca.pem
# 中間輸入密碼的地方都不輸入密碼,直接回車
# windows系統直接在使用的瀏覽器設置中導入生成的這個證書即可
將創建的 admin.pfx 導入到系統的證書中。對於 Mac,操作如下:
重啟瀏覽器,再次訪問 apiserver 地址,提示選擇一個瀏覽器證書,這里選中上面導入的 admin.pfx:
這一次,被授權訪問 kube-apiserver 的安全端口:
客戶端選擇證書的原理
- 證書選擇是在客戶端和服務端 SSL/TLS 握手協商階段商定的;
- 服務端如果要求客戶端提供證書,則在握手時會向客戶端發送一個它接受的 CA 列表;
- 客戶端查找它的證書列表(一般是操作系統的證書,對於 Mac 為 keychain),看有沒有被 CA 簽名的證書,如果有,則將它們提供給用戶選擇(證書的私鑰);
- 用戶選擇一個證書私鑰,然后客戶端將使用它和服務端通信;
參考
- https://github.com/kubernetes/kubernetes/issues/31665
- https://www.sslshopper.com/ssl-converter.html
- https://stackoverflow.com/questions/40847638/how-chrome-browser-know-which-client-certificate-to-prompt-for-a-site
B.校驗證書
以校驗 kubernetes 證書(后續部署 master 節點時生成的)為例:
使用 openssl 命令
$ openssl x509 -noout -text -in kubernetes.pem
...
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=Kubernetes
Validity
Not Before: Apr 5 05:36:00 2017 GMT
Not After : Apr 5 05:36:00 2018 GMT
Subject: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=kubernetes
...
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Subject Key Identifier:
DD:52:04:43:10:13:A9:29:24:17:3A:0E:D7:14:DB:36:F8:6C:E0:E0
X509v3 Authority Key Identifier:
keyid:44:04:3B:60:BD:69:78:14:68:AF:A0:41:13:F6:17:07:13:63:58:CD
X509v3 Subject Alternative Name:
DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:10.64.3.7, IP Address:10.254.0.1
...
- 確認 Issuer 字段的內容和 ca-csr.json 一致;
- 確認 Subject 字段的內容和 kubernetes-csr.json 一致;
- 確認 X509v3 Subject Alternative Name 字段的內容和 kubernetes-csr.json 一致;
- 確認 X509v3 Key Usage、Extended Key Usage 字段的內容和 ca-config.json 中 kubernetes profile 一致;
使用 cfssl-certinfo 命令
$ cfssl-certinfo -cert kubernetes.pem
...
{
"subject": {
"common_name": "kubernetes",
"country": "CN",
"organization": "k8s",
"organizational_unit": "System",
"locality": "BeiJing",
"province": "BeiJing",
"names": [
"CN",
"BeiJing",
"BeiJing",
"k8s",
"System",
"kubernetes"
]
},
"issuer": {
"common_name": "Kubernetes",
"country": "CN",
"organization": "k8s",
"organizational_unit": "System",
"locality": "BeiJing",
"province": "BeiJing",
"names": [
"CN",
"BeiJing",
"BeiJing",
"k8s",
"System",
"Kubernetes"
]
},
"serial_number": "174360492872423263473151971632292895707129022309",
"sans": [
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local",
"127.0.0.1",
"10.64.3.7",
"10.64.3.8",
"10.66.3.86",
"10.254.0.1"
],
"not_before": "2017-04-05T05:36:00Z",
"not_after": "2018-04-05T05:36:00Z",
"sigalg": "SHA256WithRSA",
...