一、程序說明
本程序有兩個要點,第一個要點是讀取wireshark數據包(當然也可以從網卡直接捕獲改個函數就行),這個使用pyshark實現。pyshark是tshark的一個python封裝,至於tshark可以認為是命令行版的wireshark,隨wireshark一起安裝。
第二個要點是追蹤流,追蹤流在wireshark中是“tcp.stream eq 70”之類的形式,但是70這類值暫是不知道具體怎么計算出來的,但從網上資料看,是依據[IP address A, TCP port A, IP address B, TCP port B]四元組計算出來的,只要這四個值一樣那么計算出來的tcp.stream也就一樣,就認為是同一個流。那么,反過來也就是說“tcp.stream eq 70”這種形式,其實等價於"ip.addr == ip_a and tcp.port == port_a and ip.addr == ip_b and tcp.port == port_b"的形式,我們這里就是用這種形式來追蹤telnet流。
至於為什么一再強調是追蹤telnet流而不是追蹤流,是因為感覺各應用層協議沒有統一獲取應用層協議內容的方法,比如這里通過tmp_packet[highest_layer_name].get_field('data')形式讀取telnet數據的,但http則得用tmp_packet['http'].file_data讀取,ftp等其他協議又要通過其他不同屬性來獲取。
另外還要說明的一點是,數據包的每次過濾主要是借助寫display_filter重新讀取數據包文件,而不是將所有數據包讀入后自己寫代碼進行過濾(就實際來看這種方法比借助寫display_filter重新讀取數據包文件要復雜且運行速度要慢)或者寫display_filter進行二次過濾(tshark本身就不支持二次過濾,就觀察來看wireshark自己也沒有二次過濾這種東西在執行過濾器表達式時都是重新讀取數據包文件)
運行效果如下:

二、程序源代碼
import pyshark class wireshark_analysis_script(): # 此函數的作用是封裝一下pyshark.FileCapture def read_packets_from_file(self,packets_file_path,tshark_path,display_filter): packets_file_obj = pyshark.FileCapture(input_file=packets_file_path,tshark_path=tshark_path,display_filter=display_filter) return packets_file_obj # 此函數的作用是從傳送過來的所有數據包中,抽取並返回{ip_server,ip_client,port_server,port_client}四元組 def get_target_client_ip_port(self,packets_file_obj): for tmp_packet in packets_file_obj: ip_server = tmp_packet.ip.src port_server = tmp_packet.tcp.srcport ip_client = tmp_packet.ip.dst port_client = tmp_packet.tcp.dstport yield {"ip_server":ip_server,"port_server":port_server,"ip_client":ip_client, "port_client":port_client} # 此函數的作用是讀取傳過來的所有數據包應用層的數據,並打印 def follow_tcp_stream(self,packets_file_obj,ip,port): for tmp_packet in packets_file_obj: highest_layer_name = tmp_packet.highest_layer if ((tmp_packet.ip.dst == ip) and (tmp_packet.tcp.dstport == port)): print("server(%s:%s)->client(%s:%s): %s" % (tmp_packet.ip.src, tmp_packet.tcp.srcport, tmp_packet.ip.dst, tmp_packet.tcp.dstport, tmp_packet[highest_layer_name].get_field('data'))) elif ((tmp_packet.ip.src == ip) and (tmp_packet.tcp.srcport == port)): print("client(%s:%s)->server(%s:%s): %s" % (tmp_packet.ip.src, tmp_packet.tcp.srcport, tmp_packet.ip.dst, tmp_packet.tcp.dstport, tmp_packet[highest_layer_name].get_field('data'))) if __name__ == '__main__': # 要讀取的wireshark數據包的所在的路徑 packets_file_path = 'F:\\PycharmProjects\\telnet\\pyshark_pack' # tshark程序所在的路徑,tshark隨wireshark安裝 tshark_path = 'D:\\tools\\Wireshark\\tshark.exe' # 過濾器表達式,與在wireshark中使用時的寫法完全相同 first_step_filter = 'telnet contains "HiLinux"' # 用於存放要追蹤流的ip和端口 target_client_ip_port = [] # 實例化類 wireshark_analysis_script_instance = wireshark_analysis_script() # 使用first_step_filter過濾器表達式,過濾出要追蹤流的數據包 first_step_obj = wireshark_analysis_script_instance.read_packets_from_file(packets_file_path, tshark_path, first_step_filter) # 從要追蹤流的數據包中抽取出ip和端口 target_client_ip_port = wireshark_analysis_script_instance.get_target_client_ip_port(first_step_obj) first_step_obj.close() # 遍歷要追蹤流的ip+端口組合 for target_client_ip_port_temp in target_client_ip_port: ip_server = target_client_ip_port_temp['ip_server'] port_server = target_client_ip_port_temp['port_server'] ip_client = target_client_ip_port_temp['ip_client'] port_client = target_client_ip_port_temp['port_client'] # 這里是追蹤流的關鍵,所有數據包中如果數據包中{ip_server,ip_client,port_server,port_client}四元組相同,那么就認為是同一個流 # 當然追蹤流一般都是追蹤應用層的數據流,所以加上應用層協議運行過濾去掉三次握手四次揮手等沒有應用層數據的數據包;我這里要追蹤telnet數據流,所以除四元組外還加了telnet做過濾 second_step_filter = 'telnet and ip.addr == %s and ip.addr == %s and tcp.port == %s and tcp.port == %s' % (ip_server,ip_client,port_server,port_client) second_step_obj = wireshark_analysis_script_instance.read_packets_from_file(packets_file_path, tshark_path, second_step_filter) print("[%s:%s]" % (ip_client, port_client)) # 調用follow_tcp_stream將認為是同一個流的所有數據包的應用層數據打印 wireshark_analysis_script_instance.follow_tcp_stream(second_step_obj, ip_client, port_client) second_step_obj.close()
三、使用與wireshark一致的形式【20180929更新】
在前邊的解決方案中,我們使用"ip.addr == ip_a and tcp.port == port_a and ip.addr == ip_b and tcp.port == port_b"等價代替wireshark中“tcp.stream eq 70”的形式來實現追蹤流,當時的想法是不知道某個流的70這種值如何計算。
現在發現這種值pyshark在tcp.stream屬性直接給出了,所以我們完全可以使用和wireshark的“tcp.stream eq 70”一致的形式來追蹤流。第二大節程序可等介修改如下。
(當然因為是等價形式所以輸出結果還是一樣的,都是要重新解析數據包文件所以效率也就差不多,主要是為了說追流可以使用和wireshark一樣的形式)
import pyshark class wireshark_analysis_script(): # 此函數的作用是封裝一下pyshark.FileCapture def read_packets_from_file(self, packets_file_path, tshark_path, display_filter): packets_file_obj = pyshark.FileCapture(input_file=packets_file_path, tshark_path=tshark_path, display_filter=display_filter) return packets_file_obj # 此函數的作用是從傳送過來的所有數據包中,抽取並返回{ip_server,ip_client,port_server,port_client}四元組 def get_target_client_ip_port(self, packets_file_obj): for tmp_packet in packets_file_obj: ip_server = tmp_packet.ip.src port_server = tmp_packet.tcp.srcport ip_client = tmp_packet.ip.dst port_client = tmp_packet.tcp.dstport stream_value = tmp_packet.tcp.stream yield {"ip_server": ip_server, "port_server": port_server, "ip_client": ip_client, "port_client": port_client,"stream_value":stream_value} # 此函數的作用是讀取傳過來的所有數據包應用層的數據,並打印 def follow_tcp_stream(self, packets_file_obj, ip, port): for tmp_packet in packets_file_obj: highest_layer_name = tmp_packet.highest_layer #追蹤流時會有握手揮手tcp將其排除 if highest_layer_name != "TCP": if ((tmp_packet.ip.dst == ip) and (tmp_packet.tcp.dstport == port)): print("server(%s:%s)->client(%s:%s): %s" % (tmp_packet.ip.src, tmp_packet.tcp.srcport, tmp_packet.ip.dst, tmp_packet.tcp.dstport, tmp_packet[highest_layer_name].get_field('data'))) elif ((tmp_packet.ip.src == ip) and (tmp_packet.tcp.srcport == port)): print("client(%s:%s)->server(%s:%s): %s" % (tmp_packet.ip.src, tmp_packet.tcp.srcport, tmp_packet.ip.dst, tmp_packet.tcp.dstport, tmp_packet[highest_layer_name].get_field('data'))) if __name__ == '__main__': # 要讀取的wireshark數據包的所在的路徑 packets_file_path = 'F:\\PycharmProjects\\telnet\\pyshark_pack' # tshark程序所在的路徑,tshark隨wireshark安裝 tshark_path = 'D:\\tools\\Wireshark\\tshark.exe' # 過濾器表達式,與在wireshark中使用時的寫法完全相同 first_step_filter = 'telnet contains "HiLinux"' # 用於存放要追蹤流的ip和端口 target_client_ip_port = [] # 實例化類 wireshark_analysis_script_instance = wireshark_analysis_script() # 使用first_step_filter過濾器表達式,過濾出要追蹤流的數據包 first_step_obj = wireshark_analysis_script_instance.read_packets_from_file(packets_file_path, tshark_path, first_step_filter) # 從要追蹤流的數據包中抽取出ip和端口 target_client_ip_port = wireshark_analysis_script_instance.get_target_client_ip_port(first_step_obj) first_step_obj.close() # 遍歷要追蹤流的ip+端口組合 for target_client_ip_port_temp in target_client_ip_port: # stream的值 stream_value = target_client_ip_port_temp['stream_value'] ip_client = target_client_ip_port_temp['ip_client'] port_client = target_client_ip_port_temp['port_client'] # tcp.stream eq 70形式。為了排除tcp其實可以再直接加上and telnet second_step_filter = 'tcp.stream eq %s' % (stream_value) second_step_obj = wireshark_analysis_script_instance.read_packets_from_file(packets_file_path, tshark_path, second_step_filter) print("[%s:%s]" % (ip_client, port_client)) # 調用follow_tcp_stream將認為是同一個流的所有數據包的應用層數據打印 wireshark_analysis_script_instance.follow_tcp_stream(second_step_obj, ip_client, port_client) second_step_obj.close()
參考:
http://kiminewt.github.io/pyshark/
https://www.wireshark.org/docs/wsug_html_chunked/
https://medium.com/@asfandyar.khalil/tcp-stream-in-pcap-file-using-python-6991a8e7b524
