一、RKE介紹
1、介紹:RKE是經過CNCF認證的Kubernetes發行版,並且全部組件完全在Docker容器內運行
Rancher Server只能在使用RKE或K3s安裝的Kubernetes集群中運行
2、節點環境准備
firewall-cmd --permanent --add-port=22/tcp firewall-cmd --permanent --add-port=80/tcp firewall-cmd --permanent --add-port=443/tcp firewall-cmd --permanent --add-port=30000-32767/tcp firewall-cmd --permanent --add-port=30000-32767/udp firewall-cmd --reload
2、同步節點時間
yum install ntpdate -y
ntpdate time.windows.com
3、安裝docker
任何運行Rancher Server的節點上都需要安裝Docker
sudo yum install -y yum-utils device-mapper-persistent-data lvm2 sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo sudo yum install docker-ce-18.09.3-3.el7
4、安裝kubectl
cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
sudo yum install kubectl
5、安裝RKE
Rancher Kubernetes Engine用於構建Kubernetes集群的CLI
下載地址: https://github.com/rancher/rke/releases/tag/v1.1.3
mv rke_linux-amd64 rke chmod +x rke mv rke /usr/local/bin rke
6、安裝Helm
Kubernetes的軟件包管理工具
下載地址:https://github.com/helm/helm
tar -zxvf helm-v3.3.1-linux-amd64.tar.gz cd linux-amd64 mv helm /usr/local/bin chown -R admin:admin /usr/local/bin/helm helm version
7、配置ssh免密連接
用戶加入docker用戶組,讓其可以執行docker命令
sudo usermod -aG docker admin
切換成admin用戶,在執行rke up命令的主機上執行創建ssh公私鑰 並把公鑰分發到各個節點上
ssh-keygen -t rsa ssh-copy-id 192.168.112.120 ssh-copy-id 192.168.112.121
8、配置操作系統參數支持k8s集群 (所有節點上都要執行)
sudo swapoff -a sudo vi /etc/sysctl.conf net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 sudo sysctl -p
9、使用rke創建集群初始化配置文件
rke config --name cluster.yml
RKE使用一個名為cluster.yml確定如何在集群中的節點上部署Kubernetes
# If you intened to deploy Kubernetes in an air-gapped environment, # please consult the documentation on how to configure custom RKE images. nodes: - address: "192.168.30.110" port: "22" internal_address: "" role: [controlplane,etcd,worker] hostname_override: "node1" user: admin docker_socket: /var/run/docker.sock ssh_key: "" ssh_key_path: ~/.ssh/id_rsa ssh_cert: "" ssh_cert_path: "" labels: {} taints: [] - address: "192.168.30.129" port: "22" internal_address: "" role: [controlplane,etcd,worker] hostname_override: "node2" user: admin docker_socket: /var/run/docker.sock ssh_key: "" ssh_key_path: ~/.ssh/id_rsa ssh_cert: "" ssh_cert_path: "" labels: {} taints: [] - address: "192.168.30.133" port: "22" internal_address: "" role: [controlplane,etcd,worker] hostname_override: "node3" user: admin docker_socket: /var/run/docker.sock ssh_key: "" ssh_key_path: ~/.ssh/id_rsa ssh_cert: "" ssh_cert_path: "" labels: {} taints: [] services: etcd: image: "" extra_args: {} extra_binds: [] extra_env: [] external_urls: [] ca_cert: "" cert: "" key: "" path: "" uid: 0 gid: 0 snapshot: null retention: "" creation: "" backup_config: null kube-api: image: "" extra_args: {} extra_binds: [] extra_env: [] service_cluster_ip_range: 10.43.0.0/16 service_node_port_range: "" pod_security_policy: false always_pull_images: false secrets_encryption_config: null audit_log: null admission_configuration: null event_rate_limit: null kube-controller: image: "" extra_args: {} extra_binds: [] extra_env: [] cluster_cidr: 10.42.0.0/16 service_cluster_ip_range: 10.43.0.0/16 scheduler: image: "" extra_args: {} extra_binds: [] extra_env: [] kubelet: image: "" extra_args: {} extra_binds: [] extra_env: [] cluster_domain: cluster.local infra_container_image: "" cluster_dns_server: 10.43.0.10 fail_swap_on: false generate_serving_certificate: false kubeproxy: image: "" extra_args: {} extra_binds: [] extra_env: [] network: plugin: flannel options: {} mtu: 0 node_selector: {} update_strategy: null authentication: strategy: x509 sans: [] webhook: null addons: "" addons_include: [] system_images: etcd: rancher/coreos-etcd:v3.4.3-rancher1 alpine: rancher/rke-tools:v0.1.58 nginx_proxy: rancher/rke-tools:v0.1.58 cert_downloader: rancher/rke-tools:v0.1.58 kubernetes_services_sidecar: rancher/rke-tools:v0.1.58 kubedns: rancher/k8s-dns-kube-dns:1.15.2 dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.2 kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.2 kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1 coredns: rancher/coredns-coredns:1.6.9 coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1 nodelocal: rancher/k8s-dns-node-cache:1.15.7 kubernetes: rancher/hyperkube:v1.18.3-rancher2 flannel: rancher/coreos-flannel:v0.12.0 flannel_cni: rancher/flannel-cni:v0.3.0-rancher6 calico_node: rancher/calico-node:v3.13.4 calico_cni: rancher/calico-cni:v3.13.4 calico_controllers: rancher/calico-kube-controllers:v3.13.4 calico_ctl: rancher/calico-ctl:v3.13.4 calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4 canal_node: rancher/calico-node:v3.13.4 canal_cni: rancher/calico-cni:v3.13.4 canal_flannel: rancher/coreos-flannel:v0.12.0 canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4 weave_node: weaveworks/weave-kube:2.6.4 weave_cni: weaveworks/weave-npc:2.6.4 pod_infra_container: rancher/pause:3.1 ingress: rancher/nginx-ingress-controller:nginx-0.32.0-rancher1 ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1 metrics_server: rancher/metrics-server:v0.3.6 windows_pod_infra_container: rancher/kubelet-pause:v0.1.4 ssh_key_path: ~/.ssh/id_rsa ssh_cert_path: "" ssh_agent_auth: false authorization: mode: rbac options: {} ignore_docker_version: null kubernetes_version: "" private_registries: [] ingress: provider: "" options: {} node_selector: {} extra_args: {} dns_policy: "" extra_envs: [] extra_volumes: [] extra_volume_mounts: [] update_strategy: null cluster_name: "" cloud_provider: name: "" prefix_path: "" addon_job_timeout: 0 bastion_host: address: "" port: "" user: "" ssh_key: "" ssh_key_path: "" ssh_cert: "" ssh_cert_path: "" monitoring: provider: "" options: {} node_selector: {} update_strategy: null replicas: null restore: restore: false snapshot_name: "" dns: null cluster.yaml
10、部署
rke up
11、設置環境變量
export KUBECONFIG=/home/admin/kube_config_cluster.yml mkdir ~/.kube cp kube_config_cluster.yml ~/.kube/config
通過RKE安裝k8s集群成功,啟動的時候有些節點啟動的比較慢。需要稍微等待一段時間.
可以先找一台網絡好一點的,pull全部鏡像,再保存到本地,cp到所有主機
docker save -o images.tgz `docker images|awk 'NR>1 {print $1":"$2}'`
二、RKE的環境清理
1、rancher-node-1,2,3中分別執行以下命令
mkdir rancher cat > rancher/clear.sh << EOF df -h|grep kubelet |awk -F % '{print $2}'|xargs umount rm /var/lib/kubelet/* -rf rm /etc/kubernetes/* -rf rm /var/lib/rancher/* -rf rm /var/lib/etcd/* -rf rm /var/lib/cni/* -rf rm -rf /var/run/calico iptables -F && iptables -t nat -F ip link del flannel.1 docker ps -a|awk '{print $1}'|xargs docker rm -f docker volume ls|awk '{print $2}'|xargs docker volume rm rm -rf /var/etcd/ rm -rf /run/kubernetes/ docker rm -fv $(docker ps -aq) docker volume rm $(docker volume ls) rm -rf /etc/cni rm -rf /opt/cni systemctl restart docker EOF sh rancher/clear.sh 清理腳本
2、清理殘留目錄結束。如果還有問題可能需要卸載所有節點上的docker
首先查看Docker版本 # yum list installed | grep docker docker-ce.x86_64 18.05.0.ce-3.el7.centos @docker-ce-edge 執行卸載 # yum -y remove docker-ce.x86_64 刪除存儲目錄 # rm -rf /etc/docker # rm -rf /run/docker # rm -rf /var/lib/dockershim # rm -rf /var/lib/docker 如果發現刪除不掉,需要先 umount,如 # umount /var/lib/docker/devicemapper 卸載docker
rke up --config=./rancher-cluster.yml rke啟動執行是冪等操作的 有時候需要多次執行才能成功
3、rke多次安裝和卸載k8s集群問題
啟動的時候提示ectd的集群健康檢查失敗
清空節點上所有k8s的相關目錄。卸載和刪除docker所有相關目錄。重新安裝docker
最后在執行rke啟動命令
三、擴容節點,縮容節點
1、添加節點
修改cluster.yal 將需要添加的節點配置,然后運行
more cluster.yml nodes: - address: 172.20.101.103 user: ptmind role: [controlplane,worker,etcd] - address: 172.20.101.104 user: ptmind role: [controlplane,worker,etcd] - address: 172.20.101.105 user: ptmind role: [controlplane,worker,etcd] - address: 172.20.101.106 user: ptmind role: [worker] labels: {traefik: traefik-outer}
2、執行添加節點操作
rke up --update-only
3、rke 刪除節點
修改cluster.yal 將需要刪除的節點配置刪除,然后運行
more cluster.yml nodes: - address: 172.20.101.103 user: ptmind role: [controlplane,worker,etcd] - address: 172.20.101.104 user: ptmind role: [controlplane,worker,etcd] - address: 172.20.101.105 user: ptmind role: [controlplane,worker,etcd] 刪除# - address: 172.20.101.106 刪除# user: ptmind 刪除# role: [worker] 刪除# labels: {traefik: traefik-outer}
4、執行刪除節點操作
rke up --update-only
問題:當node節點處於NotReady狀態下,對節點不可做操作,比如做了刪除節點操作,會報錯,刪除不了節點。
解決辦法:
1、手動刪除節點上的組件。
2、通過命令移除節點的角色
kubectl label node prod-129 node-role.kubernetes.io/controlplane-
問題:k8s 集群的節點處於 SchedulingDisabled
解決方法:
kubectl patch node NodeName -p "{\"spec\":{\"unschedulable\":false}}"
或者:
設置不可調度 kubectl cordon node07-ingress 取消節點不可調度 kubectl uncordon node07-ingress 驅逐節點的pod kubectl drain --ignore-daemonsets --delete-local-data node07-ingress 刪除節點 kubectl delete node node07-ingress