參考文檔:http://www.hyper-v.nu/archives/marcve/2013/01/lbfo-hyper-v-switch-qos-and-actual-performance-part-1/
EtherChannel Negotiation
An EtherChannel can be established using one of three mechanisms:
- PAgP - Cisco's proprietary negotiation protocol
- LACP (IEEE 802.3ad) - Standards-based negotiation protocol
- Static Persistence ("On") - No negotiation protocol is used
沒有配置etherchannel之前:stp會禁用端口
配置之后:
問題1:Nic Teaming可以聚合帶寬,但是不會提升單個連接所獲得帶寬,為什么?
同一個Session中的數據包為啥不能做到Load Balancing?這是因為網絡的7層模型中,一個Session在傳輸過程中會被拆分成多個數據包,並且到目的之后再重組,他們必須具有一定的順序,如果這個順序弄亂了,那么到達目的重組出來的信息就是一堆無意義的亂碼。這就要求同一個session的數據包必須在同一個物理鏈路中按照順序傳輸過去。所以,10條1Gb鏈路組成的10Gb的聚合鏈路,一定不如單條10Gb鏈路來的高速和有效。
cisco的說法cisco的EtherChannel reduces part of the binary pattern that the addresses in the frame form to a numerical value that selects one of the links in the channel in order to distribute frames across the links in a channel. EtherChannel frame distribution uses a Cisco-proprietary hashing algorithm. The algorithm is deterministic; if you use the same addresses and session information, you always hash to the same port in the channel. This method prevents out-of-order packet delivery.
All ports in each EtherChannel must be the same speed. You can base the load-balance policy (frame distribution) on a MAC address (Layer 2 [L2]), an IP address (Layer 3 [L3]), or a port number (Layer 4 [L4]). You can activate these policies, respectively, if you issue the set port channel all distribution {ip | mac| session | ip-vlan-session} [source | destination | both] command. The session keyword is supported on the Supervisor Engine 2 and Supervisor Engine 720. The ip-vlan-session keyword is only supported on the the Supervisor Engine 720. Use this keyword in order to specify the frame distribution method, with the IP address, VLAN, and Layer 4 traffic.問題2.LACP運行在ISO哪一層上?
如果物理交換機也做鏈路聚合,那么我們首先要搞清楚物理交換機和主機直接如何鏈路聚合,也就是LACP.
cisco專有的協議為EtherChannel,支持的場景為:
- One IP to many IP connections. (Host A making two connection sessions to Host B and C)
- Many IP to many IP connections. (Host A and B multiple connection sessions to Host C, D, etc)
Note: One IP to one IP connections over multiple NICs is not supported. (Host A one connection session to Host B uses only one NIC).
LACP運行在MAC層上,假定所有鏈接是全雙工,點對點,同等速率的端口

3.什么是轉發亂序?
我們知道,基於網絡分層的思想,TCP與IP轉發,可以說是互不干涉的,轉發平面(或者路由器)盡力而為的轉發報文;而TCP對下層鏈路是不感知的,為了最大帶寬的利用率,啟動后以慢啟動方式快速的擴大擁塞窗口,直到丟包發生,進入擁塞避免階段(收到對方3個冗余ACK)或者慢啟動階段(超時丟包)收縮擁塞窗口,接着又開始繼續擴大擁塞窗口發送報文。
雖然IP轉發可以不理會TCP的處理方式,協議並沒有要求。但如果IP轉發能夠做點事情,幫助TCP鏈路更為平滑,豈不是更好。
下面舉個多核轉發亂序,導致TCP流量下降,以及如何解決的問題。
假設發送端發送了5個報文,序號分別是1,2,3,4,5,接收端期望也是按順序收到1,2,3,4,5,如果接受端收到了1之后,沒有收到2,但收到了3,4,5,接收端會發送3個ACK,應答報文指明了期望收到的序號是2,發送端連續收到了3個冗余ACK,會進入擁塞避免階段,擁塞窗口收縮為一半+3個報文段的大小,擁塞窗口的收縮,將影響了發送端發送報文的流量。可以簡單理解為開始水龍頭是全部打開的,這時候水流是比較大的,在出現問題后,水龍頭只打開一半多一點點,水流就降低了很多。
單核轉發,問題並不大,通常是報文先到先處理,那么順序是可以保證的。
但在多核轉發下,問題就很容易出現了。對於同一個輸入端口,有多個核處理報文,由於各種報文的處理路徑並不一致(TCP/UDP/ICMP等等),可能有些報文處理的快些,有些報文處理的慢些。比如前面的例子,假如系統有5個核,分別處理上面報文的1,2,3,4,5,核2因某些原因處理的較慢或者說被阻塞了,核3,4,5處理的較快,就先把報文3,4,5轉發出去了,接受端由於先收到的報文不是期望的,就連續發送了3個ACK過去,表示期望的報文序號是2,導致發送端的窗口收縮,流量下降。
實際這種情況是由於轉發系統亂序引起的。
4.ESXi IP hash nic team + 物理交換機LACP(動態)應用的場景?
參考:http://www.supersonicdog.com/2013/04/24/lacp/
條件:vsphere 5.1 + 分布式虛擬交換機,LACP只可以用vSphere web client設置
適用情況:到不同的IP上的流量,例如web 服務器。
好處:一個VM的多個IP會話會分布在多個物理網卡上。the same VM can use both links for different TCP or UDP sessions
不適用的情況:IP訪問比較固定,例如存儲訪問,VM訪問NFS存儲。(IP包頭里頭源和目的地址固定了,)
概念:LACP需要虛擬交換機和物理交換機上都配置(進站流量),出站流量用nic teaming設定,且為IP-hash
for VMs that host applications needing access to multiple target IP addresses, LACP links combined with IP hash load balance algorithm provide good balance of traffic across all connections. Compared to traditional NIC teaming, all links get utilized simultaneously. While traditional NIC teaming is simple to configure, without any extra steps needed on the physical switch, a given VM could only be active on one link at a time (as the MAC appearing on two ports on the switch that are not LACP configured would cause one of the ports to be shutdown)
5.ESXi nic teaming + 物理交換機靜態鏈路聚合的應用場景?
Static teaming (IEEE 802.3ad draft v1)
優點:如果交換機不支持LACP,只支持靜態LACP,
缺點:一個VM只能利用一個網卡的帶寬。靜態LACP無法檢測線纜或者配置錯誤。
In Static teaming mode there is no check for incorrectly plugged cables or other errors. This mode is useful when the preferred bandwidth exceeds a single physical NIC and the switch does not support LACP, but the switch does support static teaming.
6.將一個ESXi服務器上的2個網卡連接到一個物理交換機上有什么后果(不使用nic teaming)?
參考:http://blog.ipspace.net/2010/11/vmware-virtual-switch-no-need-for-stp.html
2個網卡屬於一個vSwitch,由於vSwitch不支持LACP和STP,所以2個連接都是活動的。vSwitch不依賴STP或 port blocking而是依靠特殊的轉發規則:split horizon switching(Cisco UCS documentation uses the term End Host Mode)
避免了轉發循環。
7.vSwitch有哪些物理交換機不同的特點?
參考:http://blog.ipspace.net/2010/11/vmware-virtual-switch-no-need-for-stp.html
Ports are not equal
In a traditional Ethernet switch, the same forwarding rules are used for all ports. Virtual switch uses different forwarding rules for vNICs and uplinks.
No MAC address learning
The hypervisor knows the MAC addresses of all virtual machines running in the ESX server; there’s no need to perform MAC address learning.
Spanning Tree Protocol is ignored
Virtual switch is not running Spanning Tree Protocol (STP) and does not send STP Bridge Protocol Data Units (BPDU). STP BPDUs received by the virtual switch are ignored. Uplinks are never blocked based on STP information.
As ESX doesn’t run STP, you should also configure spanning-tree portfast on these ports.
Split-horizon forwarding
Packets received through one of the uplinks are never forwarded to other uplinks. This rule prevents forwarding loops through the virtual switch.
Limited flooding of broadcasts/multicasts
Broadcast or multicast packets originated by a virtual machine are sent to all other virtual machines in the same port group (VMware terminology for a VLAN). They are also sent through one of the uplinks like a regular unicast packet (they are not flooded through all uplinks). This ensures that the outside network receives a single copy of the broadcast.
The uplink through which the broadcast packet is sent is chosen based on the load balancing mode configured for the virtual switch or the port group.
Broadcasts/multicasts received through an uplink port are sent to all virtual machines in the port group (identified by VLAN tag), but not to other uplinks (see split-horizon forwarding).
No flooding of unknown unicasts
Unicast packets sent from virtual machines to unknown MAC addresses are sent through one of the uplinks (selected based on the load balancing mode). They are not flooded.
Unicast packets received through the uplink ports and addressed to unknown MAC addresses are dropped.
Reverse Path check based on source MAC address
The virtual switch sends a single copy of a broadcast/multicast/unknown unicast packet to the outside network (see the no flooding rules above), but the physical switch always performs full flooding and sends copies of the packet back to the virtual switch through all other uplinks. VMware thus has to check the source MAC addresses of packets received through the uplinks. Packet received through one of the uplinks and having a source MAC address belonging to one of the virtual machines is silently dropped.
8.何為BPDU filter?
參考:https://blogs.vmware.com/vsphere/2012/11/vsphere-5-1-vds-new-features-bpdu-filter.html
http://rickardnobel.se/esxi-5-1-bdpu-guard/
BPDU包
BPDU包就是STP協議的一些交換包。沒有驗證機制信任所有的BPDU包,所以可能有假冒的BPDU包。
虛擬交換機不支持STP,自身也不會發送任何BPDU包,也不會處理任何來自物理交換機的BPDU包。
虛擬機上如果生成和傳播BPDU包會將整個cluster癱瘓掉。例如發送一個假冒的包以便贏得ROOT bridge角色。
為防止特定端口接收BPDU包,發明了BPDU Guard in Cisco and BPDU Protection on HP network device.
一旦發現某端口有BPDU包就關閉該端口。
由此引出BPDU filter,適用於VDS和VSS兩種交換機。需要每個主機一個一個的去修改
設置:
9.LACP是否可以綁定2個不同的交換機端口。
LACP itself doesn't provide the ability to bond across multiple switches; it bonds across multiple ports on a single ethernet switch, and depending on the vendor there might even be restrictions on which ports on a switch can be bonded together.
Some vendors have proprietary protocols (typically called MLAG) that allow for bonded ethernet channels across different ethernet switches; this may not be helpful when working with a server's ethernet ports.
10.不對物理或者虛擬交換機正確配置的后果
Without synchronized ESX-switch configuration you can experience one of the following two symptoms:
- Enabling static LAG on the physical switch (pSwitch), but not using IP-hash-based load balancing on vSwitch: frames from the pSwitch will arrive to ESX through an unexpected interface and will be ignored by vSwitch. Definitely true if you use active/standby NIC configuration in vSwitch, probably also true in active/active per-VM-load-balancing configuration (need to test it, but I suspect loop prevention checks in vSwitch might kick in).
- Enabling IP-hash-based load balancing in vSwitch without corresponding static LAG on the pSwitch: pSwitch will go crazy with MACFLAP messages and might experience performance issues and/or block traffic from the offending MAC addresses (Duncan Epping has experienced a nice network meltdown in a similar situation).
11.總體圖
12.LACP+nic teaming實例
虛擬交換機上的配置:
uplink port group上設置LACP Active or Passive mode
port group上設置IP hash
物理交換機上要正確配置LACP和Vlan(組內的vlan要相同)
參考:http://stretch-cloud.info/2013/09/lacp-primer-vsphere-5-5/
13.LACP注意點
ESXi5.1上只支持一個vDS創建一個LAG,但是可以建立多個vDS,建立多個LAG.
In vSphere 5.1, LACP implementation has some constraints and those were: Supports only one LAG per VDS per host. All uplinks in the dvuplink port group are included in this LAG. Only the IP hash load balancing algorithm is supported. - See more at: http://stretch-cloud.info/2013/09/much-awaited-lacp-enhancement-vsphere-5-5/#sthash.glCRc0Ig.dpuf
Hashing Algorithm - The hashing algorithm determines the LAG member used for traffic. LACP can use different properties of the outgoing traffic (e.g. source IP/Port number) to distribute traffic across all the links participating in a LAG.針對物理交換機配置LACP也需要選擇hash算法決定入站流量在LAG內的分配
14.為什么需要多個LAG?
A:DC networks moving towards 10GbE, which require multiple etherchannels
B:Hosts with mix of 1GbE and 10GbE NICs need multiple etherchannel support
15.vsphere 5.5中LACP的增強 。
Enhancement In vSphere 5.5
Support multiple LACP LAGs
Max 32 LAG per Host
Max 64 LAG per VDS Support all supported hashing algorithms in LACP (22)
Note: Uplinks must be going to either the same switch or a pair of switches appearing as a single logical switch (using vPC, VSS, MLAG, SMLT, or similar technology).
16.Cisco的鏈路聚合概念,EtherChannel,PAgP?
catOS
The Cisco-proprietary hash algorithm computes a value in the range 0 to 7. With this value as a basis, a particular port in the EtherChannel is chosen. The port setup includes a mask which indicates which values the port accepts for transmission. With the maximum number of ports in a single EtherChannel, which is eight ports, each port accepts only one value. If you have four ports in the EtherChannel, each port accepts two values, and so forth. This table lists the ratios of the values that each port accepts, which depends on the number of ports in the EtherChannel:
Number of Ports in the EtherChannel | Load Balancing |
8 | 1:1:1:1:1:1:1:1 |
7 | 2:1:1:1:1:1:1 |
6 | 2:2:1:1:1:1 |
5 | 2:2:2:1:1 |
4 | 2:2:2:2 |
3 | 3:3:2 |
2 | 4:4 |
Note: This table only lists the number of values, which the hash algorithm calculates, that a particular port accepts. You cannot control the port that a particular flow uses. You can only influence the load balance with a frame distribution method that results in the greatest variety.
Note: The hash algorithm cannot be configured or changed to load balance the traffic among the ports in an EtherChannel.
Issue the show port channel mod/port info command in order to check the frame distribution policy. In version 6.1(x) and later, you can determine the port for use in the port channel to forward traffic, with the frame distribution policy as the basis. The command for this determination is show channel hash channel-id {src_ip_addr | dest_ip_addr | src_mac_addr | dest_mac_addr | src_port |dest_port} [dest_ip_addr | dest_mac_addr | dest_port] .
These are some examples:
-
Console> (enable) show channel hash 865 10.10.10.1 10.10.10.2 Selected channel port: 1/1
-
Console> (enable) show channel hash 865 00-02-fc-26-24-94 00-d0-c0-d7-2d-d4 !--- This command should be on one line. Selected channel port: 1/2
Cisco IOS
EtherChannel load balancing can use MAC addresses, IP addresses, or Layer 4 port numbers with a Policy Feature Card 2 (PFC2) and either source mode, destination mode, or both. The mode you select applies to all EtherChannels that you configure on the switch. Use the option that provides the greatest variety in your configuration. For example, if the traffic on a channel only goes to a single MAC address, use of the destination MAC address results in the choice of the same link in the channel each time. Use of source addresses or IP addresses can result in a better load balance. Issue the port-channel load-balance {src-mac | dst-mac | src-dst-mac | src-ip | dst-ip | src-dst-ip | src-port | dst-port | src-dst-port | mpls} global configuration command in order to configure the load balancing.
-
6509#remote login switch Trying Switch ... Entering CONSOLE for Switch Type "^C^C^C" to end this session 6509-sp#test etherchannel load-balance interface port-channel 1 ip 10.10.10.2 10.10.10.1 !--- This command should be on one line. Would select Gi6/1 of Po1 6509-sp#
-
6509#remote login switch Trying Switch ... Entering CONSOLE for Switch Type "^C^C^C" to end this session 6509-sp#test etherchannel load-balance interface port-channel 1 mac 00d0.c0d7.2dd4 0002.fc26.2494 !--- This command should be on one line. Would select Gi6/1 of Po1 6509-sp#
What Is PAgP and Where Do You Use It?
PAgP aids in the automatic creation of EtherChannel links. PAgP packets are sent between EtherChannel-capable ports in order to negotiate the formation of a channel. Some restrictions are deliberately introduced into PAgP. The restrictions are:
-
PAgP does not form a bundle on ports that are configured for dynamic VLANs. PAgP requires that all ports in the channel belong to the same VLAN or are configured as trunk ports. When a bundle already exists and a VLAN of a port is modified, all ports in the bundle are modified to match that VLAN.
-
PAgP does not group ports that operate at different speeds or port duplex. If speed and duplex change when a bundle exists, PAgP changes the port speed and duplex for all ports in the bundle.
-
PAgP modes are off, auto, desirable, and on. Only the combinations auto-desirable, desirable-desirable, and on-on allow the formation of a channel. The device on the other side must have PAgP set to on if a device on one side of the channel does not support PAgP, such as a router.
PAgP is currently supported on these switches:
-
Catalyst 4500/4000
-
Catalyst 5500/5000
-
Catalyst 6500/6000
-
Catalyst 2940/2950/2955/3550/3560/3750
-
Catalyst 1900/2820
These switches do not support PAgP:
-
Catalyst 2900XL/3500XL
-
Catalyst 2948G-L3/4908G-L3
-
Catalyst 8500
ISL/802.1Q Trunking Support on EtherChannel
You can configure EtherChannel connections with or without Inter-Switch Link Protocol (ISL)/IEEE 802.1Q trunking. After the formation of a channel, the configuration of any port in the channel as a trunk applies the configuration to all ports in the channel. Identically configured trunk ports can be configured as an EtherChannel. You must have all ISL or all 802.1Q; you cannot mix the two. ISL/802.1Q encapsulation, if enabled, takes place independently of the source/destination load-balancing mechanism of Fast EtherChannel. The VLAN ID has no influence on the link that a packet takes. ISL/802.1Q simply enables that trunk to belong to multiple VLANs. If trunking is not enabled, all ports that are associated with the Fast EtherChannel must belong to the same VLAN.
要想把接口配置為PAGP 的desirable 模式使用命令:“channel-group 1 mode desirable”;
要想把接口配置為PAGP 的auto 模式使用命令:“channel-group 1 mode auto”;
要想把接口配置為LACP 的active 模式使用命令:“channel-group 1 mode active”;
要想把接口配置為LACP 的passive 模式使用命令:“channel-group 1 mode passive”。
端口通道負載均衡 port-channel load-balance
sw1(config)#port-channel load-balance ?
dst-ip Dst IP Addr
dst-mac Dst Mac Addr
src-dst-ip Src XOR Dst IP Addr
src-dst-mac Src XOR Dst Mac Addr
src-ip Src IP Addr
src-mac Src Mac Addr
1、以太網通道最多可以捆綁8條物理鏈路
2、捆綁遵循以下規則:
(1)相同VLAN
(2)端口中繼模式
(3)相同speed和duplex
17.vmware對LACP的解釋和限制:
LACP is a standards-based method to control the bundling of several physical network links together to form a logical channel for increased bandwidth and redundancy purposes. LACP enables a network device to negotiate an automatic bundling of links by sending LACP packets to the peer.
LACP works by sending frames down all links that have the protocol enabled. If it finds a device on the other end of the link that also has LACP enabled, it also sends frames independently along the same links, enabling the two units to detect multiple links between themselves and then combine them into a single logical link.
This dynamic protocol provides these advantages over the static link aggregation method supported by previous versions of vSphere:
- Plug and Play – Automatically configures and negotiates between host and access layer physical switch
- Dynamic – Detects link failures and cabling mistakes and automatically reconfigures the links
LACP limitations on a vSphere Distributed Switch
- vSphere supports only one LACP group (Uplink Port Group with LACP enabled) per distributed switch and only one LACP group per host (5.1版本)
- LACP does not support Port mirroring
- LACP settings do not exist in host profiles
- LACP only works with IP Hash load balancing and Link Status Network failover detection
- LACP between two nested ESXi hosts is not possible
18.cisco 鏈路聚合范例
Etherchannel 分為二層和三層etherchannel
以太網鏈路捆綁用來增加帶寬和負載均衡。拓撲如下:
SW1的配置:
interface FastEthernet0/1
channel-group 1 mode desirable
switchport mode trunk
interface FastEthernet0/2
channel-group 1 mode desirable
switchport mode trunk
interface Port-channel 1
switchport mode trunk
SW2的配置:
interface FastEthernet0/1
channel-group 1 mode desirable
switchport mode trunk
interface FastEthernet0/2
channel-group 1 mode desirable
switchport mode trunk
interface Port-channel 1
switchport mode trunk
show etherchannel summary 查看以太網通道的狀態
SW2#show etherchannel summary
Flags: D - down P - in port-channel
I - stand-alone s - suspended
H - Hot-standby (LACP only)
R - Layer3 S - Layer2
U - in use f - failed to allocate aggregator
u - unsuitable for bundling
w - waiting to be aggregated
d - default port
Number of channel-groups in use: 1
Number of aggregators: 1
Group Port-channel Protocol Ports
------+-------------+-----------+----------------------------------------------
1 Po1(SU) PAgP Fa0/1(P) Fa0/2(P)
S代表的是二層以太網通道 U代表UP 通道起來了 P代表這兩個接口參與了以太網通道
注意的是:邏輯接口的配置會覆蓋物理接口上的配置
這樣就看到效果了吧!
另外要注意以太網通道的模式
etherchannel 的模式:
1、PAGP的模式:
on:不進行協商,沒有協商traffic。類似nonegotiate
auto:passive negotiat state。可接受對端發出的協商,但不會主動申請。(默認)
desirable:active negotiat state。主動協商狀態。主動發送PAGP包。
2、LACP的模式:
passive:passive negotiating state。被動狀態,可接受。但不會主動申請(默認)
active:active negotiating state。主動狀態,主動申請。
注意:
on on OK
desirable desirable OK
desirable auto OK
auto auto 形成不了
auto on 形成不了
active active OK
passive active OK
passive passive 形成不了
S1(config)# interface range f0/13 -15 S1(config-if-range)# channel-group 1 mode ? active Enable LACP unconditionally auto Enable PAgP only if a PAgP device is detected desirable Enable PAgP unconditionally on Enable Etherchannel only passive Enable LACP only if a LACP device is detected S1(config-if-range)# channel-group 1 mode active Creating a port-channel interface Port-channel 1
19.虛擬交換機的三種配置模式VST VGT EST

VMFS的vmware的一種文件系統,VMDK是vmware的虛擬硬盤文件,RDM是Raw Device Mappings原生設備映射
在VMDK模式時,LUN是被ESXI掛成存儲,並且以Datastore的方式來存放,這個LUN會被格式化為VMFS格式,VM的虛擬硬盤會以VMDK的文件格式存放在這個已經成為VMFS格式的Datastore的LUN中,在RDM模式時,LUN是被視為一個獨立硬盤,也就是存儲設備上的一個LUN,這個LUN可以是各種文件格式,如NTFS,EXT3,EXT4,FAT32等,視總控這個LUN的操作系統來決定。VM可以用bit by bit可寫硬盤的方式直接可寫這個LUN,而不需要通過hypervisor的翻譯