k8s高可用部署longhorn

本文轉載自查看原文 2021-03-07 16:06 503 Linux

在部署longhorn前你需要一個k8s集群，我還是先來說一下如何部署k8s集群吧！畢竟我部署longhorn可是花了好幾天才部署完成的，記一筆這心酸的過程。k8s集群部署過程參考的是馬哥的部署步驟。

集群部署

集群部署前期准備

集群環境：

角色	IP	系統	docker版本	內核版本
master1	10.228.81.119	centos7	19.03.5	3.10.0-1062.el7.x86_64
master2	10.228.81.118	centos7	19.03.5	3.10.0-1062.el7.x86_64
master3	10.228.81.128	centos7	19.03.5	3.10.0-1062.el7.x86_64
node1	10.228.81.130	centos7	19.03.5	3.10.0-1062.el7.x86_64
node2	10.228.81.131	centos7	19.03.5	3.10.0-1062.el7.x86_64
node3	10.228.81.135	centos7	19.03.5	3.10.0-1062.el7.x86_64

記得修改主機名哦，命令則是hostnamectl set-hostname 「name」

下面的步驟只在master1上做，每台機器都做的命令在后面，往下看。

#修改hosts文件
cat >>/etc/hosts <<EOF
10.228.81.119 master1
10.228.81.130 node1
10.228.81.131 node2
10.228.81.135 node3
10.228.81.118 master2
10.228.81.128 master3
EOF
#配置免密
ssh-keygen -t rsa -b 1200
ssh-copy-id -i ~/.ssh/id_rsa.pub  root@master1
ssh-copy-id -i ~/.ssh/id_rsa.pub  root@master2
ssh-copy-id -i ~/.ssh/id_rsa.pub  root@master3
ssh-copy-id -i ~/.ssh/id_rsa.pub  root@node1
ssh-copy-id -i ~/.ssh/id_rsa.pub  root@node2
ssh-copy-id -i ~/.ssh/id_rsa.pub  root@node3
#升級內核（這一步我沒升級，centos7自帶的內核確實太老了，但是我剛開始升級了最新的內核安裝longhorn的時候失敗了所以內核升級，為非必要。要升級的小伙伴不建議升級太新，可以考慮升級到4，試一下。我就不演示了）
#配置內核參數
cat > /etc/sysctl.d/k8s.conf << EOF
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
fs.may_detach_mounts = 1
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
fs.file-max=52706963
fs.nr_open=52706963
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp.keepaliv.probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp.max_tw_buckets = 36000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp.max_orphans = 327680
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.ip_conntrack_max = 65536
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.top_timestamps = 0
net.core.somaxconn = 16384
EOF
#創建循環scp腳本，當然這一步可以省略只是為了方便。
vim for_scp1.sh
#!/bin/bash
for i in master2 master3 node1 node2 node3;do
    scp -r $1 root@$i:$1
done
vim for_scp2.sh
#!/bin/bash

for i in master2 master3;do
    scp -r $1 root@$i:$1
done
#將hosts文件發送到其他所以機子
bash for_scp1.sh /etc/hosts
#

以下操作每台機器都需要操作

#關閉防火牆，selinux
systemctl disable --now firewalld
setenforce 0
#這里注意以下改完一定要去看一下配置文件，如果前面有空格的話就不會生效
sed -i 's/enforcing/disabled/' /etc/selinux/config
#關閉swap，至於為什么？我也給忘了，一時半會想不起來了。問谷歌吧，我記得有說明
swapoff -a && sed -i.bak 's/^.*centos-swap/#&/g' /etc/fstab
#使內核參數生效
sysctl --system
#做完以上配置后，先重啟一下先吧。
reboot
#配置阿里雲的base和epel源
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo && sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo && curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
#安裝必裝的工具
sudo yum install -y yum-utils device-mapper-persistent-data lvm2 bash-completion chrony
#添加docker軟件源
sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sudo sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
#查找Docker-CE的版本:
yum list docker-ce.x86_64 --showduplicates | sort -r
#安裝指定版本的docker版本，剛開始直接裝的最新的。longhorn一直有莫名的報錯，也找了好幾整天都沒找到解決方案，最后試了老版本的才可以了，當然你新版本的docker的話使用kubeadm的時候也會有個警告啥的。所以建議安裝19版本哈。
yum install docker-ce-19.03.5 docker-ce-cli-19.03.5 -y
#啟動docker，修改docker的一些配置並重啟
systemctl start docker
cat > /etc/docker/daemon.json << EOF
{
  "registry-mirrors": ["https://f1bhsuge.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
#重啟查看docker信息確認是否已更改
systemctl restart docker && docker info
#修改chrony，配置時間同步
vim /etc/chrony.conf
# Use public servers from the pool.ntp.org project.(在這一行下面添加以下配置)
pool  ntp1.aliyun.com iburst
pool  ntp2.aliyun.com iburst
pool  ntp3.aliyun.com iburst
#保存退出，並重啟chronyd,並查看chrony狀態，設置開機啟動
systemctl restart chronyd && systemctl enable chronyd && systemctl status chronyd
#安裝kubelet kubeadm kubectl
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
#查看版本
yum list 「package-name」 --showduplicates
#我使用的是1.19.8版本，也是因為longhorn，不過也建議大家不要裝太新的，真的真的太多東西不兼容了。
yum install -y kubectl-1.19.8 kubeadm-1.19.8 kubelet-1.19.8
#先設置開機啟動，並不要將它啟動。我們先來配置個高可用
systemctl enable kubelet

高可用配置

這里說一下：我采用是官方推薦的HAproxy+Keepalived，HAproxy和Keepalived以守護進程的方式在所有Master節點部署。

1）安裝keepalived和haproxy

#在所有的master上安裝
yum install keepalived haproxy -y

2）配置Haproxy服務
所有master節點的haproxy配置相同，haproxy的配置文件是/etc/haproxy/haproxy.cfg。master1節點配置完成之后再分發給master2、master3兩個節點。

global
  maxconn  2000
  ulimit-n  16384
  log  127.0.0.1 local0 err
  stats timeout 30s
defaults
  log global
  mode  http
  option  httplog
  timeout connect 5000
  timeout client  50000
  timeout server  50000
  timeout http-request 15s
  timeout http-keep-alive 15s
frontend monitor-in
  bind *:33305
  mode http
  option httplog
  monitor-uri /monitor
listen stats
  bind    *:8006
  mode    http
  stats   enable
  stats   hide-version
  stats   uri       /stats
  stats   refresh   30s
  stats   realm     Haproxy\ Statistics
  stats   auth      admin:admin
frontend k8s-master
  bind 0.0.0.0:8443
  bind 127.0.0.1:8443
  mode tcp
  option tcplog
  tcp-request inspect-delay 5s
  default_backend k8s-master
backend k8s-master
  mode tcp
  option tcplog
  option tcp-check
  balance roundrobin
  default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
  server master1 10.228.81.119:6443  check inter 2000 fall 2 rise 2 weight 100
  server master2 10.228.81.118:6443  check inter 2000 fall 2 rise 2 weight 100
  server master3 10.228.81.128:6443  check inter 2000 fall 2 rise 2 weight 100

注意這里的三個master節點的ip地址要根據你自己的情況配置好。

3）配置Keepalived服務
keepalived中使用track_script機制來配置腳本進行探測kubernetes的master節點是否宕機，並以此切換節點實現高可用。

master1節點的keepalived配置文件如下所示，大家可以看一下我的注釋來改。配置文件所在的位置/etc/keepalived/keepalived.conf。(有一些配置后綴是cfg)。

[root@master1 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script chk_kubernetes {
    script "/etc/keepalived/check_kubernetes.sh"
    interval 2
    weight -5
    fall 3
    rise 2
}
vrrp_instance VI_1 {
    state MASTER                #其他兩台機器改為BACKUP。
    interface ens192            #你的網卡名
    mcast_src_ip 10.228.81.119  #改成你主機的IP
    virtual_router_id 60        #如果啟動失敗可以試着改一下這個
    priority 100                #我將master2設置為99 master3為98。也就是master1優先級最高，其他兩個是備用的。
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass K8SHA_KA_AUTH
    }
    virtual_ipaddress {
        10.228.81.99
    }
    track_script {
       chk_kubernetes
    }
}

4）配置健康檢測腳本

我這里將健康檢測腳本放置在/etc/keepalived目錄下，check_kubernetes.sh檢測腳本如下：

#!/bin/bash
#****************************************************************#
# ScriptName: check_kubernetes.sh
# Author: boming
# Create Date: 2020-06-23 22:19
#***************************************************************#
function chech_kubernetes() {
 for ((i=0;i<5;i++));do
  apiserver_pid_id=$(pgrep kube-apiserver)
  if [[ ! -z $apiserver_pid_id ]];then
   return
  else
   sleep 2
  fi
  apiserver_pid_id=0
 done
}
# 1:running  0:stopped
check_kubernetes
if [[ $apiserver_pid_id -eq 0 ]];then
 /usr/bin/systemctl stop keepalived
 exit 1
else
 exit 0
fi

根據上面的注釋說明將其他master也配置好，最后啟動，並查看一下是否啟動成功systemctl enable --now keepalived haproxy ** systemctl status keepalived haproxy

確保萬一，查看一下服務狀態

[root@master1 ~]# ping 10.228.81.99
PING 10.228.81.99 (10.228.81.99) 56(84) bytes of data.
64 bytes from 10.228.81.99: icmp_seq=1 ttl=64 time=0.093 ms
64 bytes from 10.228.81.99: icmp_seq=2 ttl=64 time=0.053 ms
64 bytes from 10.228.81.99: icmp_seq=3 ttl=64 time=0.061 ms
^C
--- 10.228.81.99 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2045ms
rtt min/avg/max/mdev = 0.053/0.069/0.093/0.017 ms

正式部署

好了，操作了這么多的前期工作后我們開始正式部署集群。

1）初始化Master1節點

#先生成一個預處理文件
kubeadm config print init-defaults > kubeadm-init.yaml

生成的yaml文件我們需要修改並添加幾點配置，配置如下大家可以看一下

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.228.81.99    #VIP地址
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  name: master1
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
---
apiServer:
  #需要添加的配置
  certSANs:
   - "10.228.81.99"     #VIP地址
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers  #鏡像倉庫地址
controlPlaneEndpoint: "10.228.81.99:8443" #VIP地址
kind: ClusterConfiguration
kubernetesVersion: v1.19.8  #kubernetes版本號
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16  ##添加pod網段，這里使用的是flnnal網絡默認網段。
scheduler: {}

修改好后，將這個配置發送給其他master，提前先下載好images。

#下載鏡像
kubeadm config images pull --config kubeadm-init.yaml
#鏡像下載完后在master1上初始化
kubeadm init --config kubeadm-init.yaml --upload-certs
#初始化結束后我們需要按照提示做以下幾步驟
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

2）安裝flannel網絡插件

直接安裝因為一些無法描述的原因大家會發現無法直接訪問，我們需要修改hosts或者通過瀏覽器把這個yaml先下載下來先。

cat /etc/hosts
199.232.28.133  raw.githubusercontent.com
curl -o kube-flannel.yml   https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f kube-flannel.yml
kubectl get pods -n kube-system | grep flannel

3) 增加新master節點

這里需要復制剛剛集群加入完后下面的加入命令，命令大致如下：

帶control-plane的是master加入命令

kubeadm join 10.228.81.99:8443 --token rp262l.015tj4ehxzbxp2tq     --discovery-token-ca-cert-hash sha256:010caec5acd2b508d2f33a3d872be0306bae7195959dd8699fec8e5ab0a142bd  --control-plane --certificate-key 22d117f1b29fe94bad304331cd6a871d964ef603aae7931767a488244f7df63d

不帶的是node加入命令

kubeadm join 10.228.81.99:8443 --token abcdef.0123456789abcdef     --discovery-token-ca-cert-hash sha256:010caec5acd2b508d2f33a3d872be0306bae7195959dd8699fec8e5ab0a142bd

一些零散的命令，先放着，相信你會需要的。😁

#因k8s的安全性，所以它的token默認都是有時效性的哦
#查看token
kubeadm token list
#生成一個永久token
kubeadm token create --ttl 0
#生成一個node加入命令
kubeadm token create --print-join-command
#生成一個新的control-plane –certificate-key，主節點加入的時候使用
kubeadm init phase upload-certs --upload-certs

注意：最好一個一個來，不要一股腦的加入。如遇到以下報錯：

[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: error syncing endpoints with etc: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher

查看集群的高可用配置是否有問題，比如keepalived的配置中，主備，優先級是否都配置好了。
如果報錯了重來請使用kubeadm reset命令重置清理一下，然后再再次加入嘗試。

注意：安裝docker之前一定要看一下自己之前有沒有裝docker，有的話卸載了 yum remove docker*。所以最好是最小化安裝的一個干凈的環境。我吃了這個虧了，好幾次都是這樣。

longhorn部署

官方地址

k8s集群已經部署完畢了，其實longhorn部署確實不難。就那么幾條命令，可誰成想就這么簡單的時候折騰了我好幾天沒搞定。哎，

1）部署iscsi-initiator-utils

部署longhorn之前先在node節點上安裝
yum install iscsi-initiator-utils或者使用kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.1.0/deploy/iscsi/longhorn-iscsi-installation.yaml 我是兩個都使用了，因為我剛開始沒注意。參考鏈接點擊前往
我嘗試將pod運行的iscsi刪除，查看了longhorn暫未出現異常，應該只需要裝一個即可了。

2) 部署longhorn

雖然我已經部署完畢了，pod也都起來了，也沒看到一些報錯了，但是其實現在的我部署還是有點問題的，主要是這么幾點問題

部署1.1.0還是無法啟動（我這里說的是直接哦）

部署了之后無法刪除namespace

就是刪除namespace后重新部署一堆莫名的報錯，忘記截圖了

針對這幾個問題我是這么解決的：
問題一：

直接部署1.1.0的時候連UI的pod都沒起來，然后我嘗試了一下1.0.2版本，是可以啟動並拉起所需pod。我通過部署1.0.2版本，然后采用升級的手段來部署1.1.0版本，這是成功了的。

問題二：

針對無法刪除，我采用的是crul方法，步驟如下：

先運行kubectl get namespace test -o json > tmp.json，拿到當前namespace描述，然后打開tmp.json，刪除其中的spec字段。因為這邊的K8s集群是帶認證的，所以又新開了窗口運行kubectl proxy跑一個API代理在本地的8081端口。最后運行curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://127.0.0.1:8001/api/v1/namespaces/test/finalize

#生成當前namespace配置的json文件
kubectl get namespace longhorn-system -o json > tmp.json
#然后刪除json文件中的spec字段與它下面的第二字段
#因為k8s的安全問題，需要開一個API代理
kubectl proxy 127.0.0.1:8080
#刪除發送刪除請求
curl -k -H "Content-Type: application/json" -X PUT --data-binary @tmp.json http://10.228.81.119:8001/api/v1/namespaces/longhorn-system/finalize

問題三：

經過問題二刪除了namespace后呢，如果你立即apply 的時候會有報錯，那么很無奈。等一等吧，我記得我也沒做什么操作多 apply 幾遍就好了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 k8s部署高可用Ingress k8s的高可用 Rancher 2.2.2 - HA 部署高可用k8s集群 k8s kubeadm部署高可用集群 kubernetes(十一)--kubeadm部署k8s高可用集群 k8s高可用部署后續：SLB 二進制部署k8s 1.18.14（高可用） k8s二進制部署(高可用) sealos 離線部署 k8s 高可用集群基於Centos 7.8 和Kubeadm部署k8s高可用集群