https://rtodto.net/fragmented-ip-packet-forwarding/
IP分片只有第一個帶有傳輸層或ICMP首部,其余的分片只有IP頭。
分片報文的有效長度是8的倍數
分片需要解決的問題主要有兩個:第一,如何判斷是否需要分片(若報文的長度大於1500字節且在分片標志上又允許分片,則需要分片)。第二,在分片時都需要做些什么事?如果不允許分片,那么IP層就直接把數據包丟棄,同時,發送一個ICMP的錯誤回應報文給源端。
[root@bogon scapy]# cat frag.py #!/usr/bin/python from scapy.all import * sip="10.10.103.81" dip="10.10.103.229" payload="A"*496+"B"*500 packet=IP(src=sip,dst=dip,id=12345)/UDP(sport=1500,dport=1501)/payload frags=fragment(packet,fragsize=500) counter=1 for fragment in frags: print "Packet no#"+str(counter) print "===================================================" fragment.show() #displays each fragment counter+=1 send(fragment)
[root@bogon scapy]# python frag.py Packet no#1 =================================================== ###[ IP ]### version = 4 ihl = None tos = 0x0 len = None id = 12345 flags = MF frag = 0 ttl = 64 proto = udp chksum = None src = 10.10.103.81 dst = 10.10.103.229 \options \ ###[ Raw ]### load = "\x05\xdc\x05\xdd\x03\xec\x1d'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" . Sent 1 packets. Packet no#2 =================================================== ###[ IP ]### version = 4 ihl = None tos = 0x0 len = None id = 12345 flags = frag = 63 ttl = 64 proto = udp chksum = None src = 10.10.103.81 dst = 10.10.103.229 \options \ ###[ Raw ]### load = 'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB' . Sent 1 packets. [root@bogon scapy]#
[root@bogon ~]# tcpdump -i enahisic2i3 udp and host 10.10.103.229 -env tcpdump: listening on enahisic2i3, link-type EN10MB (Ethernet), capture size 262144 bytes 09:33:51.063862 48:57:02:64:ea:1e > Broadcast, ethertype IPv4 (0x0800), length 538: (tos 0x0, ttl 64, id 12345, offset 0, flags [+], proto UDP (17), length 524) 10.10.103.81.vlsi-lm > 10.10.103.229.saiscm: UDP, bad length 996 > 496 09:33:53.323477 48:57:02:64:ea:1e > Broadcast, ethertype IPv4 (0x0800), length 534: (tos 0x0, ttl 64, id 12345, offset 504, flags [none], proto UDP (17), length 520) 10.10.103.81 > 10.10.103.229: ip-proto-17
[root@bogon scapy]# netstat -i Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg br1 1450 647 0 0 0 9 0 0 0 BMRU brq38c0d85e-bd 1500 2283 0 0 0 7 0 0 0 BMRU brqf1411bad-10 1500 377062 0 0 0 410925 0 0 0 BMRU enah2i3.1022 1500 130391 0 0 0 186 0 0 0 BMRU enah2i3.1030 1500 187599 0 0 0 708 0 0 0 BMRU enahisic2i0 1500 51252624 0 2558626 0 9121332 0 0 0 BMRU enahisic2i1 1500 51019017 0 5601611 0 1453 0 0 0 BMRU enahisic2i2 1500 28739907 0 211542 0 127 0 0 0 BMRU enahisic2i3 1500 512773286 0 7104854 0 2347674 0 0 0 BMRU enahisic2i3.222 1500 334 0 0 0 369 0 0 0 BMRU enahisic2i3.310 1500 0 0 0 0 0 0 0 0 BMRU enahisic2i3.900 1500 51252624 0 2558626 0 9121335 0 0 0 BMRU lo 65536 984375886 0 0 0 984375886 0 0 0 LRU tapae492383-36 1500 334 0 0 0 501 0 0 0 BMRU tapb28a1d0d-a4 1500 4762 0 0 0 200041 0 0 0 BMRU tapebc6bb55-29 1500 5 0 0 0 9 0 0 0 BMRU veth1 1500 334 0 0 0 369 0 0 0 BMRU virbr0 1500 0 0 0 0 0 0 0 0 BMU vxlan100 1450 353 0 0 0 350 0 0 0 BMRU [root@bogon scapy]#
換個存在的dst ip
[root@bogon scapy]# ping 10.10.103.82 PING 10.10.103.82 (10.10.103.82) 56(84) bytes of data. 64 bytes from 10.10.103.82: icmp_seq=1 ttl=64 time=0.072 ms 64 bytes from 10.10.103.82: icmp_seq=2 ttl=64 time=0.067 ms ^C --- 10.10.103.82 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1064ms rtt min/avg/max/mdev = 0.067/0.069/0.072/0.008 ms
[root@bogon scapy]# cat frag.py #!/usr/bin/python from scapy.all import * sip="10.10.103.81" dip="10.10.103.82" payload="A"*496+"B"*500 packet=IP(src=sip,dst=dip,id=12345)/UDP(sport=1500,dport=1501)/payload frags=fragment(packet,fragsize=500) counter=1 for fragment in frags: print "Packet no#"+str(counter) print "===================================================" fragment.show() #displays each fragment counter+=1 send(fragment) [root@bogon scapy]#
[root@bogon scapy]# cat frag.py #!/usr/bin/python from scapy.all import * sip="10.10.103.81" dip="10.10.103.82" payload="A"*496+"B"*500 packet=IP(src=sip,dst=dip,id=12345)/UDP(sport=1500,dport=1501)/payload frags=fragment(packet,fragsize=500) counter=1 for fragment in frags: print "Packet no#"+str(counter) print "===================================================" fragment.show() #displays each fragment counter+=1 send(fragment) [root@bogon scapy]#
[root@bogon ~]# tcpdump -i enahisic2i3 host 10.10.103.82 -teennvv tcpdump: listening on enahisic2i3, link-type EN10MB (Ethernet), capture size 262144 bytes 48:57:02:64:ea:1e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.103.82 tell 10.10.103.81, length 28 48:57:02:64:e7:ae > 48:57:02:64:ea:1e, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.10.103.82 is-at 48:57:02:64:e7:ae, length 46 48:57:02:64:ea:1e > 48:57:02:64:e7:ae, ethertype IPv4 (0x0800), length 538: (tos 0x0, ttl 64, id 12345, offset 0, flags [+], proto UDP (17), length 524) 10.10.103.81.1500 > 10.10.103.82.1501: UDP, bad length 996 > 496 48:57:02:64:ea:1e > 48:57:02:64:e7:ae, ethertype IPv4 (0x0800), length 534: (tos 0x0, ttl 64, id 12345, offset 504, flags [none], proto UDP (17), length 520) 10.10.103.81 > 10.10.103.82: ip-proto-17 48:57:02:64:e7:ae > 48:57:02:64:ea:1e, ethertype IPv4 (0x0800), length 590: (tos 0xc0, ttl 64, id 4394, offset 0, flags [none], proto ICMP (1), length 576) 10.10.103.82 > 10.10.103.81: ICMP 10.10.103.82 udp port 1501 unreachable, length 556 (tos 0x0, ttl 64, id 12345, offset 0, flags [none], proto UDP (17), length 1024) 10.10.103.81.1500 > 10.10.103.82.1501: UDP, length 996
先reassemble再提交給4層,然后發現port unreachable
Let me explain each line what happens here.
1st Packet
IP (tos 0x0, ttl 64, id 12345, offset 0, flags [+], proto UDP (17), length 524) 144.2.3.2.1500 > 173.63.1.2.1501: UDP, length 996
We pushed 496A+500B bytes of payload of data to scapy. Dear scapy took 496bytes of this data which is all A characters and encapsulated with 8 bytes of UDP header + 20 bytes of IP header which is in total = 524 bytes. Pay attention to the port numbers. Those are the UDP port numbers we set in the code. UDP length shows 996bytes since our payload is this number of bytes in total. ID number is 12345 and it is the same on 1st and 2nd packet. Offset is also 0 as this is the first packet. Although we can’t see on this output, we have also More Fragment bit is on.
2nd packet
IP (tos 0x0, ttl 64, id 12345, offset 504, flags [none], proto UDP (17), length 520) 144.2.3.2 > 173.63.1.2: ip-proto-17
Real fun begins here. Where are the port numbers? We don’t have them on the second packet as the UDP header is on the first packet. You can see this from the packet size. 500bytes(B) payload + 20 bytes IP header i.e no room for header. The evidence of fragmentation is the offset but why is it 504? This field specifies how far we are from the beginning of the unfragmented IP packet and I believe it counts UDP header too:) so our offset should be 496A + 8bytes UDP header = 504.
Note: If you display the same packets in Wireshark, due to the default setting “Reassemble fragmented IPv4 datagrams“, it misleads you to think that UDP header is on the second packet instead of the first one. Be careful!
IP協議協議--IP分片
如圖,當IP數據報超過幀的MTU(最大傳輸單元)時,它將會被分片傳輸。分片能發生在發送端或者中轉路由器,且在傳輸過程中可能被多次分片。在最后的目標機器上這些分片才會被內核的的IP模塊重新組裝。
在IPv4的頭部信息中有3個字段專門為IP分片服務的:
一個IP數據報的每個分片都具有自己的IP頭部信息,它們都具有相同的標識值,但是具有不同的位偏移,且除了最后一個分片外,其他分片都將設置MF標志。此外,每個分片的IP頭部的總長度字段將被設置為該分片的長度。
以太網幀的MTU是1500字節,因此它的數據部分最大為1480字節(IP頭部占用20字節)。為觀察IP分片的數據報,這里采用ICMP協議發送一個長度為1501字節的IP數據報,其中IP頭部占用20字節,ICMP報文占據1481字節。1481字節的ICMP數據報中含8字節ICMP頭部,其他1473字節為數據部分。長度為1504的IP數據報被拆分為2個IP分片:
(1) 第1個IP分片:1480字節ICMP數據報文(含8字節的ICMP頭部信息) + 20 字節IP頭部信息 = 1500字節的IP數據報,設置了MF位
(2) 第2個IP分片:1字節的ICMP數據報文(不含8字節的ICMP頭部信息) + 20 字節的IP頭部信息 = 21字節的IP數據包,沒有設置MF位 --------------沒有icmp頭部
用戶要發送的以太網幀:
分片后:
注:分片1的IP頭部信息的MF(More Fragment)位被置為1,分片2的該位不設置,即為0
事實上,ICMP報文的頭部長度取決於ICMP報文的類型且變化范圍很大。這里以8字節為例,因為接下來的用例用到了ping程序,ping應用程序使用的ICMP回顯和應答報文的頭部長度都是8字節。
2. 用tcpdump命令觀察IP分片
機器1:Ubuntu14.04 IP地址為192.168.239.136
機器2:Ubuntu11.04 IP地址為192.168.239.145
用機器1來ping機器1,每次傳送1473字節的數據(這是ICMP的數據部分)以引起IP分片,用tcpdump抓取過程中雙方交換的數據包:
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 1. IP (tos 0x0, ttl 64, id 52457, offset 0, flags [+], proto ICMP (1), length 1500) 192.168.239.136 > 192.168.239.145: ICMP echo request, id 4938, seq 1, length 1480 2. IP (tos 0x0, ttl 64, id 52457, offset 1480, flags [none], proto ICMP (1), length 21) 192.168.239.136 > 192.168.239.145: ip-proto-1 3. IP (tos 0x0, ttl 64, id 36694, offset 0, flags [+], proto ICMP (1), length 1500) 192.168.239.145 > 192.168.239.136: ICMP echo reply, id 4938, seq 1, length 1480 4. IP (tos 0x0, ttl 64, id 36694, offset 1480, flags [none], proto ICMP (1), length 21) 192.168.239.145 > 192.168.239.136: ip-proto-1
一條ping命令對應兩個IP數據報,以第一條ping語句的IP數據報為例:
(1) 兩個IP分片的標識值52457,說明它們是同一個IP數據報的分片
(2) 第1個分片的位偏移為0,第2個為1480。顯然,第2個分片的位偏移實際上就是第一個分片的ICMP報文的長度
(3) 第1個分片中“flags[+]”即設置了MF標志,表示還有后續分片;第2個分片中“flags[none]”即沒有設置MF位
(4) 兩個IP分片的長度分別1500字節和21字節