檢查階段
運行部署前檢查的時候
# ansible-playbook -vv playbooks/prerequisites.yml
需要看看play recap是否全過,如果不過需要定位原因,反復執行
之前在檢查階段,因為node1,node2經常連接不上master(設置為yum源)的repo/base,也就是RHEL7.6的包,暫時解決辦法是在repo中分別掛在自己本地的源繞開錯誤。
部署階段
# ansible-playbook -vv /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
安裝完成后的提示,如果有不成功,解決問題以后反復執行。
檢查安裝
[root@master yum.repos.d]# oc login -u system:admin Logged into "https://master.example.com:8443" as "system:admin" using existing credentials. You have access to the following projects and can switch between them with 'oc project <projectname>': * default kube-public kube-system management-infra openshift openshift-console openshift-infra openshift-logging openshift-metrics-server openshift-monitoring openshift-node openshift-sdn openshift-web-console Using project "default". [root@master yum.repos.d]# oc get nodes NAME STATUS ROLES AGE VERSION master.example.com Ready master 23m v1.11.0+d4cacc0 node1.example.com Ready infra 18m v1.11.0+d4cacc0 node2.example.com Ready compute 18m v1.11.0+d4cacc0
[root@master yum.repos.d]# oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-9q962 1/1 Running 0 17m default registry-console-1-4mb7d 1/1 Running 0 17m default router-1-74pr6 1/1 Running 0 17m kube-system master-api-master.example.com 1/1 Running 0 22m kube-system master-controllers-master.example.com 1/1 Running 1 22m kube-system master-etcd-master.example.com 1/1 Running 0 22m openshift-console console-5896bbb547-df6p2 1/1 Running 0 15m openshift-infra hawkular-cassandra-1-k5bg2 1/1 Running 0 12m openshift-infra hawkular-metrics-6ldrw 0/1 Pending 0 6m openshift-infra hawkular-metrics-858mh 0/1 Preempting 0 12m openshift-infra hawkular-metrics-schema-sd7c5 0/1 Completed 0 13m openshift-infra heapster-tvn6t 1/1 Running 0 12m openshift-logging logging-es-data-master-4g5tbuou-1-bcnsx 0/2 Pending 0 5m openshift-logging logging-es-data-master-4g5tbuou-1-deploy 1/1 Running 0 5m openshift-logging logging-fluentd-m5rbg 1/1 Running 0 6m openshift-logging logging-fluentd-m64sn 1/1 Running 0 6m openshift-logging logging-fluentd-nqpz4 1/1 Running 0 6m openshift-logging logging-kibana-1-wpf2t 2/2 Running 0 7m openshift-metrics-server metrics-server-845b478887-vcbkd 0/1 ErrImagePull 0 11m openshift-monitoring alertmanager-main-0 3/3 Running 0 14m openshift-monitoring alertmanager-main-1 3/3 Running 0 14m openshift-monitoring alertmanager-main-2 3/3 Running 0 14m openshift-monitoring cluster-monitoring-operator-674969789d-65rxn 1/1 Running 0 16m openshift-monitoring grafana-7594d8dd75-cwr6p 2/2 Running 0 15m openshift-monitoring kube-state-metrics-787f69cf4d-xjh76 3/3 Running 0 14m openshift-monitoring node-exporter-bwvpv 2/2 Running 0 14m openshift-monitoring node-exporter-hzbb8 2/2 Running 0 14m openshift-monitoring node-exporter-rdzlp 2/2 Running 0 14m openshift-monitoring prometheus-k8s-0 4/4 Running 1 15m openshift-monitoring prometheus-k8s-1 4/4 Running 1 15m openshift-monitoring prometheus-operator-8544897d54-z7249 1/1 Running 0 16m openshift-node sync-6xthq 1/1 Running 0 20m openshift-node sync-rsgz9 1/1 Running 0 19m openshift-node sync-vsbws 1/1 Running 0 19m openshift-sdn ovs-5d2dl 1/1 Running 0 20m openshift-sdn ovs-gd4gw 1/1 Running 0 19m openshift-sdn ovs-ktpt6 1/1 Running 0 19m openshift-sdn sdn-dz8kv 1/1 Running 0 19m openshift-sdn sdn-mhbkg 1/1 Running 0 19m openshift-sdn sdn-x7tq9 1/1 Running 0 20m openshift-web-console webconsole-5db89b6cd4-5sm9d 1/1 Running 2 16m
metrics還出不來
在master節點執行創建admin用戶
# htpasswd /etc/origin/master/htpasswd admin
同時賦予admin用戶權限
# oc adm policy add-cluster-role-to-user cluster-admin admin
在hosts文件中加入
192.168.0.103 master.example.com
192.168.0.104 console.apps.example.com
192.168.0.104 prometheus-k8s-openshift-monitoring.apps.example.com
192.168.0.104 grafana-openshift-monitoring.apps.example.com
192.168.0.104 hawkular-metrics.apps.example.com
訪問https://master.example.com:8443,轉到cluster console下,可以訪問到集群相關的監控信息
修改錯誤
- Metrics
經過定位,metrics啟動不了的原因主要是兩點:
1.ose-metrics-server的鏡像缺失,這個重新導入后解決
2.openshift-monitoring下的node2下的node-exporter-sbddr一直啟動出錯,經過定位發現是安裝了一個gitlab軟件造成的端口沖突問題,把gitlab停掉后啟動成功
[root@master ~]# oc get pods -n openshift-monitoring -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE alertmanager-main-0 3/3 Running 23 21h 10.129.0.69 node1.example.com <none> alertmanager-main-1 3/3 Running 20 21h 10.129.0.66 node1.example.com <none> alertmanager-main-2 3/3 Running 20 21h 10.129.0.68 node1.example.com <none> cluster-monitoring-operator-674969789d-65rxn 1/1 Running 10 21h 10.129.0.65 node1.example.com <none> grafana-7594d8dd75-cwr6p 2/2 Running 18 21h 10.129.0.64 node1.example.com <none> kube-state-metrics-787f69cf4d-xjh76 3/3 Running 20 21h 10.129.0.71 node1.example.com <none> node-exporter-bwvpv 2/2 Running 8 21h 192.168.0.104 node1.example.com <none> node-exporter-hzbb8 2/2 Running 14 21h 192.168.0.103 master.example.com <none> node-exporter-sbddr 2/2 Running 0 13m 192.168.0.105 node2.example.com <none> prometheus-k8s-0 4/4 Running 22 21h 10.129.0.70 node1.example.com <none> prometheus-k8s-1 4/4 Running 22 21h 10.129.0.67 node1.example.com <none> prometheus-operator-8544897d54-z7249 1/1 Running 8 21h 10.129.0.63 node1.example.com <none>
3.openshift-infra下面的hawkular-metrics-9r5nc pod一直在pending狀態,describe一下發現需要1.5G的內存,修改rc hawkular-metrics request為500m,后啟動成功
[root@master ~]# oc get pods -n openshift-infra -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE hawkular-cassandra-1-k5bg2 1/1 Running 4 21h 10.130.0.42 node2.example.com <none> hawkular-metrics-9r5nc 1/1 Running 0 11m 10.129.0.75 node1.example.com <none> hawkular-metrics-schema-sd7c5 0/1 Completed 0 21h 10.130.0.3 node2.example.com <none> heapster-tvn6t 1/1 Running 39 21h 10.128.0.53 master.example.com <none>
終於也能截圖展示一下了。
- EFK
經過定位主要是內存不夠問題導致,所以現有的16G機器無法折騰了,看了pod啟動命令,一個啟動起來居然就要8G.令人發指啊!