這幾天在客戶環境中搞Operatorhub的離線,因為已經安裝了OpenShift 4.3的集群,所以目標是只將考核的Service Mesh和Serverless模塊安裝上去即刻,因為前期工作關系,我曾在離線的4.2環境安裝過類似組件,所以稍作准備就出發了,但這幾天遇到的問題和坑確實不少,4.3和4.2相比在離線方面有很大的改進,但又埋了另外一些坑,本文算是大致的一個記錄。
另外感謝各位前輩及前浪的指引,讓我在一片混亂中清晰了思路。
1.制作catalog的鏡像
因為網絡環境太慢,所以建議大家直接mirror到本地的倉庫然后再進行
oc image mirror registry.redhat.io/openshift4/ose-operator-registry:v4.3 registry.example.com/openshift4/ose-operator-registry
形成本地的catalog鏡像
oc adm catalog build --appregistry-org redhat-operators --from=registry.example.com/openshift4/ose-operator-registry:v4.3 --to=registry.example.com/olm/redhat-operators:v1 --insecure
形成要mirror下載的鏡像文件
oc adm catalog mirror --manifests-only registry.example.com/olm/redhat-operators:v1 registry.example.com --insecure
形成的目錄結構如下
[root@registry test]# tree redhat-operators-manifests/ redhat-operators-manifests/ ├── imageContentSourcePolicy.yaml └── mapping.txt
打開mapping.txt文件看一下
registry.redhat.io/openshift-service-mesh/istio-rhel8-operator:1.0.5=registry.example.com/openshift-service-mesh/istio-rhel8-operator:1.0.5 registry.redhat.io/openshift-service-mesh/3scale-istio-adapter-rhel8@sha256:00fb544a95b16c652cc571396679c65d5889b2cfe6f1a0176f560a1678309a35=registry.example.com/openshift-service-mesh/3scale-istio-adapter-rhel8 registry.redhat.io/container-native-virtualization/kubevirt-kvm-info-nfd-plugin@sha256:bb120df34c6eef21431a074f11a1aab80e019621e86b3ffef4d10d24cb64d2df=registry.example.com/container-native-virtualization/kubevirt-kvm-info-nfd-plugin
基本上全是安裝operator需要的sha256碼的鏡像,以及和本地register server的對應關系了。
最好的做法是基於下面的語句把所有的鏡像都下載下來,但因為我們只需要兩個模塊,所以采用了手工的模式。(這也就注定了大量的工作時間和反復的鏡像導入)
oc apply -f ./redhat-operators-manifests
上面命令是官方的做法,下午驗證了一下,發現需要具備集群環境,我自己寫了一個腳本進行批量的下載,首先可以縮減需要下載的鏡像,按照命名空間,然后再通過腳本批量mirror
[root@registry redhat-operators-manifests]# cat batchmirror.sh #!/bin/bash i=0 while IFS= read -r line do i=$((i + 1)) echo $i; source=$(echo $line | cut -d'=' -f 1) echo $source target=$(echo $line | cut -d'=' -f 2) echo $target skopeo copy --all docker://$source docker://$target sleep 20 done < eventing.txt
2.形成離線的Operatorhub Catalog.
這個步驟比較容易。主要是
oc patch OperatorHub cluster --type json \ -p '[{"op": "add", "path": "/spec/disableAllDefaultSources", "value": true}]'
然后建立一個文件catalogsource.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: my-operator-catalog namespace: openshift-marketplace spec: sourceType: grpc image: registry.example.com/olm/redhat-operators:v1 displayName: My Operator Catalog publisher: grpc
建立完成后檢查,operatorhub界面里面應該有所有紅帽的鏡像
oc create -f catalogsource.yaml oc get pods -n openshift-marketplace oc get catalogsource -n openshift-marketplace oc describe catalogsource internal-mirrored-operatorhub-catalog -n openshift-marketplace
3.基於模塊下載Operator及組件鏡像
到了這一步就滿滿的坑了,先安裝一個ElasticSearch Operator,然后發現Image Pull Error,再mapping中找到具體的sha256碼,比如
registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:aa0c7b11a655454c5ac6cbc772bc16e51ca5004eedccf03c52971e8228832370
按照4.2的做法,只是需要運行
oc image mirror registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f registry.example.com/openshift4/ose-elasticsearch-operator
然后運行成功后,為了驗證,需要在本地拉取一下
podman pull registry.example.com/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f
你會發現根本拉不下來,據說這是因為在4.3中某些鏡像屬於多層的sh256碼,而解決辦法是
skopeo copy --all docker://registry.redhat.io/openshift4/ose-elasticsearch-operator@sha256:0203a2a6d55763ed09b2517c656d035af439553c7915e55e4cc93f5bcda3989f docker://registry.example.com/openshift4/ose-elasticsearch-operator
然后將registry的存放地址打成tar包,在離線環境解開就可。
因為大部分的operator的鏡像都是sha256模式,所以需要一個一個的skopeo。此處消耗大量時間。
4. sample-registres.conf文件
這個文件的目的是為了將源地址和目標地址進行映射,並且讓ocp的crio知道如何去下載源地址的鏡像。
unqualified-search-registries = ["docker.io"] [[registry]] location = "quay.io/openshift-release-dev/ocp-release" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/ocp4/openshift4" insecure = false [[registry]] location = "quay.io/openshift-release-dev/ocp-v4.0-art-dev" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/ocp4/openshift4" insecure = false [[registry]] location = "registry.redhat.io/distributed-tracing" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/distributed-tracing" insecure = false [[registry]] location = "registry.redhat.io/openshift-service-mesh" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/openshift-service-mesh" insecure = false [[registry]] location = "registry.redhat.io/openshift4" insecure = false blocked = false mirror-by-digest-only = false prefix = "" [[registry.mirror]] location = "YOUR_REGISTRY_URL/openshift4" insecure = false
而這個配置需要刷到集群的每台機器上去,這個刷機的動作是由machine-config這個cluster operator完成的,正常步驟是
創建一個machineconfig.yaml,然后運行刷機。。。。
cat sample-registries.conf | base64 | tr -d '\n' apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: annotations: labels: machineconfiguration.openshift.io/role: worker name: 50-worker-container-registry-conf spec: config: ignition: version: 2.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,${YOUR_FILE_CONTENT_IN_BASE64} verification: {} filesystem: root mode: 420 path: /etc/containers/registries.conf oc apply -f machineconfig.yaml
然后當前集群的machine-config的Cluster Operator的狀態為false,嘗試修復未果,心生一計,直接將這個sample-registres.conf覆蓋每一台機器的registries.conf,覆蓋完成記得重新啟動crio
systemctl restart crio
如果不放心,可以直接在node上運行,如果正常,應該可以出來。
podman pull registry.redhat.io/.....@sha256....
5. Knative
一切安裝就緒,在嘗試helloworld-go的時候,又出現了X509的問題,找了半天,發現是一個已知問題,之前一直在aws公有雲上嘗試,所以沒遇到,但如果將例子程序放在本地的鏡像倉庫中就必現了,
客官可見: https://github.com/knative/serving/issues/5126
解決辦法也很野蠻,直接在configmap中跳過tag解析,(下面代碼僅作參考,我是基於圖形界面修改的)
oc -n knative-serving edit configmap config-deployment apiVersion: v1 data: queueSidecarImage: gcr.azk8s.cn/knative-releases/knative.dev/serving/cmd/queue@sha256:5ff357b66622c98f24c56bba0a866be5e097306b83c5e6c41c28b6e87ec64c7c registriesSkippingTagResolving: registry.example.com
一切正常后,發現event的source的創建方式變了,cronjobsource已經deprecated,不讓創建,只好通過下面命令
$ oc get inmemorychannel NAME READY REASON URL AGE imc-msgtxr True http://imc-msgtxr-kn-channel.kn-demo.svc.cluster.local 24s kn source ping create msgtxr-pingsource \ --schedule="* * * * *" \ --data="This message is from PingSource" \ --sink=http://imc-msgtxr-kn-channel.kn-demo.svc.cluster.local
創建完成后終於一切正常,而我也終於有機會苟延殘喘,記錄一下。 :(