最近遇到一個MySQL連接的問題,遠程連接MySQL時遇到“ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0”錯誤,如下所示:
[root@DB-Server ~]# mysql -h 10.13.65.93 -u onecard -p
Enter password:
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 0
這個測試的MySQL位於阿里雲Kubernetes(K8s)中Docker容器里面,而且在遠程連接MySQL出現上面錯誤的時候,Docker也會出現下面錯誤。
一般出現“ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet'”錯誤的原因較多:
1:網絡異常或時延非常高的時候, 超過連接時間限制(系統變量connect_timeout)會導致這個錯誤。MySQL客戶端與數據庫建立連接需要發起三次握手協議,正常情況下,這個時間非常短,但是一旦網絡異常,網絡超時等因素出現,就會導致這個握手協議無法完成,MySQL有個參數connect_timeout,它是MySQL服務端進程mysqld等待連接建立完成的時間,單位為秒。如果超過connect_timeout時間范圍內,仍然無法完成協議握手話,MySQL客戶端會收到異常。 更多詳細信息可以參考我這篇博客“MySQL參數max_connect_errors分析釋疑”,但是當前這個案例中,不存在網絡延時情況,如下所示:
[root@DB-Server ~]# ping 10.13.65.93
PING 10.13.65.93 (10.13.65.93) 56(84) bytes of data.
64 bytes from 10.13.65.93: icmp_seq=1 ttl=97 time=36.1 ms
64 bytes from 10.13.65.93: icmp_seq=2 ttl=97 time=36.3 ms
64 bytes from 10.13.65.93: icmp_seq=3 ttl=97 time=36.1 ms
64 bytes from 10.13.65.93: icmp_seq=4 ttl=97 time=36.0 ms
64 bytes from 10.13.65.93: icmp_seq=5 ttl=97 time=36.1 ms
64 bytes from 10.13.65.93: icmp_seq=6 ttl=97 time=36.2 ms
64 bytes from 10.13.65.93: icmp_seq=7 ttl=97 time=36.1 ms
64 bytes from 10.13.65.93: icmp_seq=8 ttl=97 time=36.2 ms
--- 10.13.65.93 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7003ms
rtt min/avg/max/mdev = 36.092/36.205/36.354/0.205 ms
2:域名解析會導致這個問題。當客戶端連接上來,服務器端都會對客戶端進來的IP地址進行DNS解析,來獲得客戶端的域名或主機名,如果DNS解析出了問題或DNS解析相當慢,就會導致連接驗證用戶出現問題。而skip-name-resolve這個參數的意義就是禁止域名解析。官方文檔解釋如下:
For each new client connection, the server uses the client IP address to check whether the client host name is in the host cache. If so, the server refuses or continues to process the connection request depending on whether or not the host is blocked. If the host is not in the cache, the server attempts to resolve the host name. First, it resolves the IP address to a host name and resolves that host name back to an IP address. Then it compares the result to the original IP address to ensure that they are the same. The server stores information about the result of this operation in the host cache. If the cache is full, the least recently used entry is discarded.
The server handles entries in the host cache like this:
-
When the first TCP client connection reaches the server from a given IP address, a new cache entry is created to record the client IP, host name, and client lookup validation flag. Initially, the host name is set to NULL and the flag is false. This entry is also used for subsequent client TCP connections from the same originating IP.
當有一個新的客戶端連接通過TCP進來時,MySQL Server會為這個IP在host cache中建立一個新的記錄,包括IP,主機名和client lookup validation flag,分別對應host_cache表中的IP,HOST和HOST_VALIDATED這三列。第一次建立連接因為只有IP,沒有主機名,所以HOST將設置為NULL,HOST_VALIDATED將設置為FALSE。
-
If the validation flag for the client IP entry is false, the server attempts an IP-to-host name-to-IP DNS resolution. If that is successful, the host name is updated with the resolved host name and the validation flag is set to true. If resolution is unsuccessful, the action taken depends on whether the error is permanent or transient. For permanent failures, the host name remains NULL and the validation flag is set to true. For transient failures, the host name and validation flag remain unchanged. (In this case, another DNS resolution attempt occurs the next time a client connects from this IP.)
MySQL Server檢測HOST_VALIDATED的值,如果為FALSE,它會試圖進行DNS解析,如果解析成功,它將更新HOST的值為主機名,並將HOST_VALIDATED值設為TRUE。如果沒有解析成功,判斷失敗的原因是永久的還是臨時的,如果是永久的,則HOST的值依舊為NULL,且將HOST_VALIDATED的值設置為TRUE,后續連接不再進行解析,如果該原因是臨時的,則HOST_VALIDATED依舊為FALSE,后續連接會再次進行DNS解析。
-
If an error occurs while processing an incoming client connection from a given IP address, the server updates the corresponding error counters in the entry for that IP. For a description of the errors recorded, see Section 26.12.17.1, “The host_cache Table”.
如果在處理來自給定IP地址的傳入客戶端連接時發生錯誤,則服務器會更新該IP條目中的相應錯誤計數器。 有關記錄的錯誤的說明,請參見第26.12.17.1節“host_cache表”。
這個案例里面,因為MySQL位於阿里雲Kubernetes(K8s)中Docker容器里面,對公司內部的IP地址進行DNS解析確實會出現問題。我們在配置文件設置skip_name_resolve后,確實解決了這個問題。然后本來以為找到了原因的我,在本地兩台機器上測試時發現(一台MySQL版本為5.6.41, 一台MySQL版本為5.6.23),即使兩台服務器相互不能做DNS解析,如下截圖所示,但是從192.168.27.180連接DB-Server時,並不會報這個錯誤。Why? 即使我將connect_timeout調整為2,依然不會出現這個錯誤。看來MySQL的連接不像我們表面看的那樣簡單。還是相當復雜。只是目前的技術水平,還做不到進一步分析!
另外,在這個案例的測試過程中,發現skip_name_resolve為OFF的情況下,將connect_timeout設大,也不會出現這個錯誤
mysql> show variables like '%connect_timeout%';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| connect_timeout | 10 |
+-----------------+-------+
1 row in set (0.01 sec)
mysql> set global connect_timeout=30;
Query OK, 0 rows affected (0.00 sec)
mysql>
然后從客戶端連接MySQL數據庫就成功了,如下所示,只是IP地址並不是客戶端的IP地址,而是Port IP。
當然這種情況下Kubernetes(K8s)中Docker下MySQL並沒有掛掉,反而當系統變量connect_timeout=10的情況下,如果沒有開啟系統變量skip_name_resolve,每次遠程連接MySQL就會出現Kubernetes(K8s)中Docker下MySQL掛掉,重啟的過程,所以極度懷疑是疑因為在連接過程,Docker下MySQL掛掉重啟才出現這個錯誤。但是對K8s了解不多,涉及太廣,沒法進一步分析具體原因了。