Skywalking是一款分布式追蹤應用,具體介紹可以參考 skywalking。
最近公司的一個Php應用在Skywalking后台查不到數據了:
登錄到某台服務器上發現注冊不上,啟動時就報錯了:
先來整理下Skywalking php的整個流程,php擴展在系統啟動時注冊應用和實例,然后在每次請求攔截相關調用,將相關調用情況保存下來;注冊相關代碼在skywalking.c的module_init中:
static void module_init() { application_instance = -100000; application_id = -100000; int i = 0; do { application_id = serviceRegister(SKYWALKING_G(grpc), SKYWALKING_G(app_code)); if(application_id == -100000) { sleep(1); } i++; } while (application_id == -100000 && i <= 1); if (application_id == -100000) { sky_close = 1; return; } char *ipv4s = _get_current_machine_ip(); char hostname[100] = {0}; if (gethostname(hostname, sizeof(hostname)) < 0) { strcpy(hostname, ""); } char *l_millisecond = get_millisecond(); long millisecond = zend_atol(l_millisecond, strlen(l_millisecond)); efree(l_millisecond); i = 0; do { application_instance = serviceInstanceRegister(SKYWALKING_G(grpc), application_id, millisecond, SKY_OS_NAME, hostname, getpid(), ipv4s); if(application_instance == -100000) { sleep(2); } i++; } while (application_instance == -100000 && i <= 3); if (application_instance == -100000) { sky_close = 1; php_error(E_WARNING, "skywalking: register service error"); return; } php_error(E_WARNING, "skywalking: register service success"); }
可以看到,注冊應用是調用serviceRegister函數注冊,然后調用serviceInstanceRegister來注冊實例的,后者會調用GreeterClient::serviceInstanceRegister以下函數完成注冊:
int serviceInstanceRegister(int applicationid, long registertime, char *osname, char *hostname, int processno, char *ipv4s) { ServiceInstances request; ServiceInstance *s = request.add_instances(); if (uuid == NULL) { std::string uuid_str = boost::uuids::to_string(boost_uuid); uuid = (char *) malloc(uuid_str.size() + 1); bzero(uuid, uuid_str.size() + 1); strncpy(uuid, uuid_str.c_str(), uuid_str.size() + 1); } s->set_serviceid(applicationid); s->set_instanceuuid(std::string(uuid)); s->set_time(registertime); KeyStringValuePair *os = s->add_properties(); KeyStringValuePair *host = s->add_properties(); KeyStringValuePair *process = s->add_properties(); KeyStringValuePair *ipv4 = s->add_properties(); KeyStringValuePair *language = s->add_properties(); os->set_key("os_name"); os->set_value(osname); host->set_key("host_name"); host->set_value(hostname); process->set_key("process_no"); process->set_value(std::to_string(processno)); ipv4->set_key("ipv4"); ipv4->set_value(ipv4s); language->set_key("language"); language->set_value("php"); ServiceInstanceRegisterMapping reply; ClientContext context; Status status = stub_->doServiceInstanceRegister(&context, request, &reply); if (status.ok()) { for (int i = 0; i < reply.serviceinstances_size(); i++) { const KeyIntValuePair &kv = reply.serviceinstances(i); // std::cout << "Register Instance:"<< std::endl; // std::cout << kv.key() << ": " << kv.value() << std::endl; if (kv.key() == uuid) { return kv.value(); } } } return -100000; }
通過gdb的斷點,發現注冊應用是成功的,注冊實例失敗了,然后在GreeterClient::serviceInstanceRegister加上相應的日志:
if (status.ok()) { std::cout << "size:" << reply.serviceinstances_size() << std::endl; for (int i = 0; i < reply.serviceinstances_size(); i++) { const KeyIntValuePair &kv = reply.serviceinstances(i); std::cout << "Register Instance:"<< std::endl; std::cout << kv.key() << ": " << kv.value() << std::endl; if (kv.key() == uuid) { return kv.value(); } } }else{ printf("instance register error"); }
客戶端已經沒有線索了,只好從服務端入手,因為服務端是Java實現的,不大方便調試,因此在本地搭了個環境想調試下,哪知服務端跑起來了,Php客戶端死活編譯不上,因為Skywalking依賴protobuf、grpc等組件,這些組件之間有版本依賴關系的,官方文檔也沒有說明,一時陷入困境。
因之前服務端維護的同學走了,只好自己硬着頭皮看代碼,發現注冊入口代碼在RegisterServiceHandler::doServiceInstanceRegister中:
@Override public void doServiceInstanceRegister(ServiceInstances request, StreamObserver<ServiceInstanceRegisterMapping> responseObserver) { ServiceInstanceRegisterMapping.Builder builder = ServiceInstanceRegisterMapping.newBuilder(); request.getInstancesList().forEach(instance -> { ServiceInventory serviceInventory = serviceInventoryCache.get(instance.getServiceId()); JsonObject instanceProperties = new JsonObject(); List<String> ipv4s = new ArrayList<>(); for (KeyStringValuePair property : instance.getPropertiesList()) { String key = property.getKey(); switch (key) { case HOST_NAME: instanceProperties.addProperty(HOST_NAME, property.getValue()); break; case OS_NAME: instanceProperties.addProperty(OS_NAME, property.getValue()); break; case LANGUAGE: instanceProperties.addProperty(LANGUAGE, property.getValue()); break; case "ipv4": ipv4s.add(property.getValue()); break; case PROCESS_NO: instanceProperties.addProperty(PROCESS_NO, property.getValue()); break; } } instanceProperties.addProperty(IPV4S, ServiceInstanceInventory.PropertyUtil.ipv4sSerialize(ipv4s)); String instanceName = serviceInventory.getName(); if (instanceProperties.has(PROCESS_NO)) { instanceName += "-pid:" + instanceProperties.get(PROCESS_NO).getAsString(); } if (instanceProperties.has(HOST_NAME)) { instanceName += "@" + instanceProperties.get(HOST_NAME).getAsString(); } int serviceInstanceId = serviceInstanceInventoryRegister.getOrCreate(instance.getServiceId(), instanceName, instance.getInstanceUUID(), instance.getTime(), instanceProperties); if (serviceInstanceId != Const.NONE) { logger.info("register service instance id={} [UUID:{}]", serviceInstanceId, instance.getInstanceUUID()); builder.addServiceInstances(KeyIntValuePair.newBuilder().setKey(instance.getInstanceUUID()).setValue(serviceInstanceId)); } }); responseObserver.onNext(builder.build()); responseObserver.onCompleted(); }
關鍵是這行代碼來生成實例id的:
int serviceInstanceId = serviceInstanceInventoryRegister.getOrCreate(instance.getServiceId(), instanceName, instance.getInstanceUUID(), instance.getTime(), instanceProperties);
再跟進去:
@Override public int getOrCreate(int serviceId, String serviceInstanceName, String uuid, long registerTime, JsonObject properties) { if (logger.isDebugEnabled()) { logger.debug("Get or create service instance by service instance name, service id: {}, service instance name: {},uuid: {}, registerTime: {}", serviceId, serviceInstanceName, uuid, registerTime); } int serviceInstanceId = getServiceInstanceInventoryCache().getServiceInstanceId(serviceId, uuid); if (serviceInstanceId == Const.NONE) { ServiceInstanceInventory serviceInstanceInventory = new ServiceInstanceInventory(); serviceInstanceInventory.setServiceId(serviceId); serviceInstanceInventory.setName(serviceInstanceName); serviceInstanceInventory.setInstanceUUID(uuid); serviceInstanceInventory.setIsAddress(BooleanUtils.FALSE); serviceInstanceInventory.setAddressId(Const.NONE); serviceInstanceInventory.setRegisterTime(registerTime); serviceInstanceInventory.setHeartbeatTime(registerTime); serviceInstanceInventory.setProperties(properties); InventoryStreamProcessor.getInstance().in(serviceInstanceInventory); } return serviceInstanceId; }
這里的邏輯就比較清晰了,先從緩存中拿實例ID:
getServiceInstanceInventoryCache().getServiceInstanceId(serviceId, uuid);
拿不到則加入后台任務處理生成ID。
再跟進getServiceInstanceId方法,
if (Objects.isNull(serviceInstanceId) || serviceInstanceId == Const.NONE) { serviceInstanceId = getCacheDAO().getServiceInstanceId(serviceId, uuid); if (serviceId != Const.NONE) { serviceInstanceNameCache.put(ServiceInstanceInventory.buildId(serviceId, uuid), serviceInstanceId); } }
從緩存中拿不到則從DAO中拿,
GetResponse response = getClient().get(ServiceInstanceInventory.INDEX_NAME, id); if (response.isExists()) { return (int)response.getSource().getOrDefault(RegisterSource.SEQUENCE, 0); } else { return Const.NONE; }
后者從ES索引serviceinstanceinventory去拿。
為了證實上述邏輯無誤,從ES中讀取數據試下,果然實例ID都注冊在ES里面:
再從客戶端證實下,既然實例ID是寫入ES的,那么用以前的ID肯定是能注冊成功的,因此修改客戶端代碼,將UUID寫死注冊試下:
int serviceInstanceRegister(int applicationid, long registertime, char *osname, char *hostname, int processno, char *ipv4s) { ServiceInstances request; ServiceInstance *s = request.add_instances(); uuid= "7e22c317-e2e2-4f81-a53d-fe011013e0a3"; if (uuid == NULL) { std::string uuid_str = boost::uuids::to_string(boost_uuid); uuid = (char *) malloc(uuid_str.size() + 1); bzero(uuid, uuid_str.size() + 1); strncpy(uuid, uuid_str.c_str(), uuid_str.size() + 1); }
馬上注冊成功了:
7e22c317-e2e2-4f81-a53d-fe011013e0a3 size:1 Register Instance: 7e22c317-e2e2-4f81-a53d-fe011013e0a3: 3386041 PHP Warning: skywalking: register service success in Unknown on line 0 PHP Warning: skywalking: hook redis handler success in Unknown on line 0 PHP Warning: skywalking: hook session handler success in Unknown on line 0
再回到這個問題,原因已經知道了,如何解決呢,有兩個辦法:
1、加大注冊時等待時間,如等待到100秒;
2、記錄最近一次注冊成功的UUID並且持久化,下次啟動時直接用上次的;
因為2涉及到改代碼,因此先用方案1解決問題。