paping使用來測試聯通&網站由於tcp協議導致的無法通信問題超時問題

本文轉載自查看原文 2017-02-22 10:44 1291 系統調優

今天上班遇到一個神奇的問題，之所以神奇，是因為自己之前從來沒遇到過，就好像之前從來沒打過電話，拿到電話，突然發現一根線或線都不需要就可以和千里之外的Ta聊天的感覺

首先介紹一個工具，微軟雲同事介紹的，用於解決一些服務器禁止Ping, icmp 包響應的另外一種方式：

1. 使用paping來測試連通性 Linux 平台：：

wget http://www.updateweb.cn/softwares/paping_1.5.5_x86-64_linux.tar.gz

wget https://zhangtaostorage.blob.core.chinacloudapi.cn/share/paping_1.5.5_x86-64_linux.tar.gz

這是一個壓縮包，解壓方法 tar zvxf paping_1.5.5_x86-64_linux.tar.gz

使用方法 ./paping–p 80 -c 500 www.xxx.com (該示例命令為進行500次的目標IP 80 端口的連通性測試)

2、使用psping來測試連通性 win平台：

Psping下載地址: http://www.updateweb.cn/softwares/PSTools.zip

Psping下載地址: http://technet.microsoft.com/en-us/sysinternals/jj729731

並且放到C:\Windows\system32目錄下

然后在cmd模式下執行：psping ipaddress:port

例如：

-----------------------------------------------------------------------

言歸正傳：

你發現第一張圖，出現了connection timed out 的提示，而同一網下的另一台機器卻連通自在，第一反應就是防火牆 or 網絡黑名單阻止掉了，經過多方一起排查，這個猜想是錯誤的，

最后解決方法是：

確認一下您的Linux系統的內核參數配置：sysctl -a | grep tcp

sysctl -w net.ipv4.tcp_timestamps=1

sysctl -w net.ipv4.tcp_tw_recycle=0

改為Linux 內核參數：

sysctl -w net.ipv4.tcp_timestamps=0

sysctl -w net.ipv4.tcp_tw_recycle=0

即可，網絡一下子就順暢了

附注：

---------------------------------------------------------

PsPing v2.01 使用說明

By Mark Russinovich

Published: January 29, 2014

Download PsTools(1,644 KB)

Rate:
Introduction

PsPing implements Ping functionality, TCP ping, latency and bandwidth measurement. Use the following command-line options to show the usage for each test type:

Installation

Copy PsPing onto your executable path. Typing "psping" displays its usage syntax.

Using PsPing

PsPing implements Ping functionality, TCP ping, latency and bandwidth measurement. Use the following command-line options to show the usage for each test type:

Usage: psping -? [i|t|l|b]

-? IUsage for ICMP ping.-? TUsage for TCP ping.-? LUsage for latency test.-? BUsage for bandwidth test.

ICMP ping usage: psping [[-6]|[-4]] [-h [buckets | <val1>,<val2>,...]] [-i <interval>] [-l <requestsize>[k|m] [-q] [-t|-n <count>] [-w <count>] <destination>

-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-iInterval in seconds. Specify 0 for fast ping.-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of pings or append 's' to specify seconds e.g. '10s'.-qDon't output during pings.-tPing until stopped with Ctrl+C and type Ctrl+Break for statistics.-wWarmup with the specified number of iterations (default is 1).-4Force using IPv4.-6Force using IPv6.

For high-speed ping tests use -q and -i 0.

TCP ping usage: psping [[-6]|[-4]] [-h [buckets | <val1>,<val2>,...]] [-i <interval>] [-l <requestsize>[k|m] [-q] [-t|-n <count>] [-w <count>] <destination:destport>

For high-speed ping tests use -q and -i 0.

TCP and UDP latency usage:

server: psping [[-6]|[-4]] [-f] <-s source:sourceport>

client: psping [[-6]|[-4]] [-f] [-u] [-h [buckets | <val1>,<val2>,...]] [-r] <-l requestsize>[k|m]] <-n count> [-w <count>] <destination:destport>

-fOpen source firewall port during the run.-uUDP (default is TCP).-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of sends/receives. Append 's' to specify seconds e.g. '10s'-rReceive from the server instead of sending.-wWarmup with the specified number of iterations (default is 5).-4Force using IPv4.-6Force using IPv6.-sServer listening address and port.

The server can serve both latency and bandwidth tests and remains active until you terminate it with Control-C.

TCP and UDP bandwidth usage:

server: psping [[-6]|[-4]] [-f] <-s source:sourceport>

client: psping [[-6]|[-4]] [-f] [-u] [-h [buckets | <val1>,<val2>,...]] [-r] <-l requestsize>[k|m]] <-n count> [-i <outstanding>] [-w <count>] <destination:destport>

-fOpen source firewall port during the run.-uUDP (default is TCP).-bBandwidth test.-hPrint histogram (default bucket count is 20).
If you specify a single argument, it's interpreted as a bucket count and the histogram will contain that number of buckets covering the entire time range of values. Specify a comma-separated list of times to create a custom histogram (e.g. "0.01,0.05,1,5,10").-iNumber of outstanding I/Os (default is min of 16 and 2x CPU cores).-lRequest size. Append 'k' for kilobytes and 'm' for megabytes.-nNumber of sends/receives. Append 's' to specify seconds e.g. '10s'-rReceive from the server instead of sending.-wWarmup for the specified iterations (default is 2x CPU cores).-4Force using IPv4.-6Force using IPv6.-sServer listening address and port.

The server can serve both latency and bandwidth tests and remains active until you terminate it with Control-C.

Examples

This command executes an ICMP ping test for 10 iterations with 3 warmup iterations:
psping -n 10 -w 3 marklap

To execute a TCP connect test, specify the port number. The following command executes connect attempts against the target as quickly as possible, only printing a summary when finished with the 100 iterations and 1 warmup iteration:
psping -n 100 -i 0 -q marklap:80

To configure a server for latency and bandwidth tests, simply specify the -s option and the source address and port the server will bind to:
psping -s 192.168.2.2:5000

A buffer size is required to perform a TCP latency test. This example measures the round trip latency of sending an 8KB packet to the target server, printing a histogram with 100 buckets when completed:
psping -l 8k -n 10000 -h 100 192.168.2.2:5000

This command tests bandwidth to a PsPing server listening at the target IP address for 10 seconds and produces a histogram with 100 buckets. Note that the test must run for at least one second after warmup for a histogram to generate. Simply add -u to have PsPing perform a UDP bandwidth test.
psping -b -l 8k -n 10000 -h 100 192.168.2.2:5000

---------------------------

附2：

tcp_tw_recycle和tcp_timestamps導致connect失敗問題

    近來線上陸續出現了一些connect失敗的問題，經過分析試驗，最終確認和proc參數tcp_tw_recycle/tcp_timestamps相關；
1. 現象
    第一個現象：模塊A通過NAT網關訪問服務S成功，而模塊B通過NAT網關訪問服務S經常性出現connect失敗，抓包發現：服務S端已經收到了syn包，但沒有回復synack；另外，模塊A關閉了tcp timestamp，而模塊B開啟了tcp timestamp；
    第二個現象：不同主機上的模塊C（開啟timestamp），通過NAT網關（1個出口ip）訪問同一服務S，主機C1 connect成功，而主機C2 connect失敗；

2. 分析
    根據現象上述問題明顯和tcp timestmap有關；查看linux 2.6.32內核源碼，發現tcp_tw_recycle/tcp_timestamps都開啟的條件下，60s內同一源ip主機的socket connect請求中的timestamp必須是遞增的。
    源碼函數：tcp_v4_conn_request(),該函數是tcp層三次握手syn包的處理函數（服務端）；
    源碼片段：
       if (tmp_opt.saw_tstamp &&
            tcp_death_row.sysctl_tw_recycle &&
            (dst = inet_csk_route_req(sk, req)) != NULL &&
            (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
            peer->v4daddr == saddr) {
            if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
                (s32)(peer->tcp_ts - req->ts_recent) >
                            TCP_PAWS_WINDOW) {
                NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
                goto drop_and_release;
            }
        }
        tmp_opt.saw_tstamp：該socket支持tcp_timestamp
        sysctl_tw_recycle：本機系統開啟tcp_tw_recycle選項
        TCP_PAWS_MSL：60s，該條件判斷表示該源ip的上次tcp通訊發生在60s內
        TCP_PAWS_WINDOW：1，該條件判斷表示該源ip的上次tcp通訊的timestamp 大於本次tcp

    分析：主機client1和client2通過NAT網關（1個ip地址）訪問serverN，由於timestamp時間為系統啟動到當前的時間，因此，client1和client2的timestamp不相同；根據上述syn包處理源碼，在tcp_tw_recycle和tcp_timestamps同時開啟的條件下，timestamp大的主機訪問serverN成功，而timestmap小的主機訪問失敗；

    參數：/proc/sys/net/ipv4/tcp_timestamps - 控制timestamp選項開啟/關閉
          /proc/sys/net/ipv4/tcp_tw_recycle - 減少timewait socket釋放的超時時間

3. 解決方法
    echo 0 > /proc/sys/net/ipv4/tcp_tw_recycle;
    tcp_tw_recycle默認是關閉的，有不少服務器，為了提高性能，開啟了該選項；
    為了解決上述問題，個人建議關閉tcp_tw_recycle選項，而不是timestamp；因為在tcp timestamp關閉的條件下，開啟tcp_tw_recycle是不起作用的；而tcp timestamp可以獨立開啟並起作用。
    源碼函數：  tcp_time_wait()
    源碼片段：
        if (tcp_death_row.sysctl_tw_recycle && tp->rx_opt.ts_recent_stamp)
            recycle_ok = icsk->icsk_af_ops->remember_stamp(sk);
        ......

        if (timeo < rto)
            timeo = rto;

        if (recycle_ok) {
            tw->tw_timeout = rto;
        } else {
            tw->tw_timeout = TCP_TIMEWAIT_LEN;
            if (state == TCP_TIME_WAIT)
                timeo = TCP_TIMEWAIT_LEN;
        }

        inet_twsk_schedule(tw, &tcp_death_row, timeo,
                   TCP_TIMEWAIT_LEN);

    timestamp和tw_recycle同時開啟的條件下，timewait狀態socket釋放的超時時間和rto相關；否則，超時時間為TCP_TIMEWAIT_LEN，即60s；

    內核說明文檔對該參數的介紹如下：
    tcp_tw_recycle - BOOLEAN
    Enable fast recycling TIME-WAIT sockets. Default value is 0.
    It should not be changed without advice/request of technical
    experts.

原文鏈接：http://blog.sina.com.cn/u/2015038597

-----------------------------

附2：

一.情況表現為

1.在公司內網對站點的http訪問：

linux主機出現故障：curl以及抓包分析，發現服務端不響應linux客戶端的請求，無法建立TCP連接，瀏覽器返回“無法連接到服務器”

windows主機正常

2.http訪問質量下降：

基調顯示，新架構上線后，訪問質量下滑，主要表現為

2.1.訪問提示“無法連接到服務器”

2.2.僅少數人遇到這種故障，並且一天中不是每次訪問都會遇到，而是出現時好時壞的現象

二.處理過程

直接上google搜索關鍵字“服務器無法建立TCP連接”。

翻了幾頁后。

看了一下，和我們公司內網的表現一模一樣，但各種問題（1為這方面基礎知識薄弱，2為沒有時間驗證此配置）

然后這種問題持續了n久...一直以為是內部設備問題

后期搞不定了，大膽在線上啟用這個參數“net.ipv4.tcp_timestamps = 0”，做了下測試后，發現故障解除，原故障機每次訪問都正常了！

不過還是不明其中原理，只是大意了解，同樣處於NAT上網方式的用戶里（與別人共用出口IP地址），如果你的時間戳小於別人的，那么服務器不會響應你的TCP請求，要忽略此項，將net.ipv4.tcp_timestamps = 0（/etc/sysctl.conf）

三.總結

后期學習時，看見了一個更加詳細的博客，講的很詳細，也引入了新的問題：

====== 小抄 ======

其實，linux服務器原本對時間戳（timestamps）默認是不開啟的，Linux是否啟用這種行為取決於tcp_timestamps和tcp_tw_recycle，因為tcp_timestamps缺省就是開啟的，所以當tcp_tw_recycle被開啟后，實際上這種行為就被激活了。

net.ipv4.tcp_tw_recycle又是啥呢，搜索了一下基本上是TIME_WAIT連接的回收參數

當 net.ipv4.tcp_timestamps 沒有設置（缺省為開啟），並且 net.ipv4.tcp_tw_recycle 也開啟時，這個坑爹的錯誤就出現了，但是注意，只表現在NAT網絡環境中。而且，大多數博客，以及一些大牛們，都有說過要開啟 net.ipv4.tcp_tw_recycle ...

====== 小抄 ======

四.未完成的事項

1.（未驗證）關閉timestamps后，tw_recycle功能是失效的問題

2.（未驗證）新的解決TIME_WAIT連接過多的方法：net.ipv4.tcp_max_tw_buckets = 10000 設置一個最大值，不過壞處是系統日志會提示：TCP: time wait bucket table overflow

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 paping使用來測試聯通&網站由於tcp協議導致的無法通信問題超時問題使用 PsPing & PaPing 進行 TCP 端口連通性測試衛星網絡中使用TCP協議的劣勢（所以才有TCP優化版用來衛星通信啊，比如TCP-Peach和ADolar） TCP協議的那些超時由delete導致的超時已過期問題 python 之網絡編程（基於TCP協議Socket通信的粘包問題及解決）關於UseSubmitBehavior和OnClientClick同時使用，導致無法觸發后台事件的問題 MySQL中in子查詢會導致無法使用索引問題（轉） Qt中使用grabKeyboard()導致QLineEdit無法輸入的問題阿里雲服務出現TCP連接快速增加尤其是NON_ESTABLISHED大量增加導致內存和CPU暴增系統無法使用的問題