话接上篇我们基于腾讯云主机搭建了DPDK VPP的学习平台,接下来学习vlan dot1q终结功能的配置及流程转发流程。下面首先来了解一下BD域的一些概念。
BD(Bridge Domain):桥接域是一种逻辑端口集合,用于实现局域网内的数据转发和广播控制。类似于linux系统中linux bridge。
BDI(Bridge Domain Interface):桥接域接口是一种允许流量在第2层桥接网络和第3层路由网络之间双向流动的技术。它主要用于将第2层以太网网段打包为第3层IP,支持IP终止、第3层VPN终端以及地址解析协议等。表明BDI主要用于实现不同网络层次之间的通信和数据传输,特别是在需要在第2层和第3层之间进行灵活路由选择的场景中。
BVI(Bridge Virtual Interface):桥接虚拟接口)是可路由转发三层逻辑接口,它将物理接口组成一个逻辑组。通过创建一个虚拟的路由器接口,使得整个交换机或路由器被视为一个单一的网络实体,从而简化了网络管理和配置过程。
BDI用于隔离不同的网络域,而BVI则是一个虚拟接口,用于在不同的网络域之间提供连接。
dot1q终结子接口的主要作用是通过识别和处理报文中的VLAN Tag,实现不同VLAN网络之间的三层互通,从而支持更复杂的网络架构和提高网络通信效率。本文只是简单了解vpp命令行配置及业务转发流程。
下面来介绍一下具体配置:在vpp中创建一个桥接域10
代码语言:javascript复制 create bridge-domain 10
创建一个BVI接口,并将BVI加入二层桥接域10中,设置bvi10接口状态及ip地址。
代码语言:javascript复制 bvi create instance 10
set interface l2 bridge bvi10 10 bvi
set interface state bvi10 up
set interface ip address bvi10 192.168.1.1/24
创建一个tap10接口用于内核于vpp通信,同时指定内核接口名称tap10,并创建一个Vlan ID 10的子接口,将子接口加入到BD 10中。并配置tap10.10接口tag-rewrite为pop 1,也就是剥掉报文的vlan tag(类似于华为设置终结子接口配置)。
代码语言:javascript复制 create tap id 10 host-if-name tap10
create sub-interfaces tap10 10
set interface l2 bridge tap10.10 10
set interface state tap10 up
set interface state tap10.10 up
set interface l2 tag-rewrite tap10.10 pop 1
可以通过命令show bridge-domain 10 detail查询BD 10 详细配置信息及接口信息。
代码语言:javascript复制DBGvpp# show bridge-domain 10 detail
BD-ID Index BSN Age(min) Learning U-Forwrd UU-Flood Flooding ARP-Term arp-ufwd Learn-co Learn-li BVI-Intf
10 1 0 off on on flood on off off 0 16777216 bvi10
span-l2-input l2-input-classify l2-input-feat-arc l2-policer-classify l2-input-acl vpath-input-l2 l2-ip-qos-record l2-input-vtr l2-learn l2-rw l2-fwd l2-flood l2-flood l2-output
Interface If-idx ISN SHG BVI TxFlood VLAN-Tag-Rewrite
bvi10 1 1 0 * * none
tap10.10 3 1 0 - * pop-1
BSN:网桥域 mac表刷新的序列号,seq number Age(min): 当前网桥域mac 老化时间设置。默认情况是关闭mac老化。 Learning:当前网桥域是否启用mac学习 U-Forwrd:当前网桥域是否启用已知单播报文转发。 UU-Flood:网桥域是否启用L2未知单播泛洪。默认为启用状态。 Flooding:当前网桥域是否启用泛洪。默认启用状态 ARP-Term:网桥域是否应该终止响应ARP请求。缺省情况下,未使能ARP终止功能。 arp-ufwd:当前网桥域名是否启用二层arp单播转发。默认为关闭状态。 Learn-co:当前网桥域MAC表学习的数量。 Learn-li:当前网桥域中MAC 表学习最大数量 BVI-Intf:显示BVI接口信息
接下来在内核创建vlan id 10的tap10.10子接口,配置ip地址192.168.1.2/24。需要手动设置tap10.10接口状态。
代码语言:javascript复制ip link add link tap10 name tap10.10 type vlan id 10
ip link set tap10.10 up
ip addr add 192.168.1.2/24 dev tap10.10
查询内核接口tap10及tap10.10配置信息。
代码语言:javascript复制root@learning-vpp:~# ip addr show dev tap10
5: tap10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN group default qlen 1000
link/ether 02:fe:c3:bf:50:76 brd ff:ff:ff:ff:ff:ff
inet6 fe80::fe:c3ff:febf:5076/64 scope link
valid_lft forever preferred_lft forever
root@learning-vpp:~# ip addr show dev tap10.10
6: tap10.10@tap10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 02:fe:c3:bf:50:76 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.2/24 scope global tap10.10
valid_lft forever preferred_lft forever
inet6 fe80::fe:c3ff:febf:5076/64 scope link
valid_lft forever preferred_lft forever
代码语言:javascript复制接下来我们在内核ping vpp的bvi接口ip地址,在vpp上通过trace抓取ping
代码语言:javascript复制报文业务转发流程。
代码语言:javascript复制DBGvpp# clear trace
DBGvpp# trace add virtio-input 2
DBGvpp# show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1
00:04:43:114643: virtio-input
virtio: hw_if_index 2 next-index 4 vring 0 len 102
hdr: flags 0x00 gso_type 0x00 hdr_len 0 gso_size 0 csum_start 0 csum_offset 0 num_buffers 1
00:04:43:114666: ethernet-input
frame: flags 0x1, hw-if-index 2, sw-if-index 2
IP4: 02:fe:c3:bf:50:76 -> b0:b0:00:00:00:0a 802.1q vlan 10
00:04:43:114686: l2-input
l2-input: sw_if_index 3 dst b0:b0:00:00:00:0a src 02:fe:c3:bf:50:76 [l2-input-vtr l2-learn l2-fwd l2-flood l2-flood ]
00:04:43:114691: l2-input-vtr
l2-input-vtr: sw_if_index 3 dst b0:b0:00:00:00:0a src 02:fe:c3:bf:50:76 data 08 00 45 00 00 54 34 40 40 00 40 01
00:04:43:114697: l2-learn
l2-learn: sw_if_index 3 dst b0:b0:00:00:00:0a src 02:fe:c3:bf:50:76 bd_index 1
00:04:43:114703: l2-fwd
l2-fwd: sw_if_index 3 dst b0:b0:00:00:00:0a src 02:fe:c3:bf:50:76 bd_index 1 result [0x700000001, 1] static age-not bvi
00:04:43:114710: ip4-input
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0x8315 dscp CS0 ecn NON_ECN
fragment id 0x3440, flags DONT_FRAGMENT
ICMP echo_request checksum 0x45d6 id 1
00:04:43:114731: ip4-lookup
fib 0 dpo-idx 7 flow hash: 0x00000000
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0x8315 dscp CS0 ecn NON_ECN
fragment id 0x3440, flags DONT_FRAGMENT
ICMP echo_request checksum 0x45d6 id 1
00:04:43:114742: ip4-receive
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0x8315 dscp CS0 ecn NON_ECN
fragment id 0x3440, flags DONT_FRAGMENT
ICMP echo_request checksum 0x45d6 id 1
00:04:43:114746: ip4-icmp-input
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0x8315 dscp CS0 ecn NON_ECN
fragment id 0x3440, flags DONT_FRAGMENT
ICMP echo_request checksum 0x45d6 id 1
00:04:43:114752: ip4-icmp-echo-request
ICMP: 192.168.1.2 -> 192.168.1.1
tos 0x00, ttl 64, length 84, checksum 0x8315 dscp CS0 ecn NON_ECN
fragment id 0x3440, flags DONT_FRAGMENT
ICMP echo_request checksum 0x45d6 id 1
00:04:43:114760: ip4-load-balance
fib 0 dpo-idx 2 flow hash: 0x00000000
ICMP: 192.168.1.1 -> 192.168.1.2
tos 0x00, ttl 64, length 84, checksum 0xa5ff dscp CS0 ecn NON_ECN
fragment id 0x1156, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x4dd6 id 1
00:04:43:114763: ip4-rewrite
tx_sw_if_index 1 dpo-idx 2 : ipv4 via 192.168.1.2 bvi10: mtu:9000 next:3 flags:[] 02fec3bf5076b0b00000000a0800 flow hash: 0x00000000
00000000: 02fec3bf5076b0b00000000a080045000054115640004001a5ffc0a80101c0a8
00000020: 010200004dd600010045305eed6500000000d24c0300000000001011
00:04:43:114768: bvi10-output
bvi10 flags 0x02180005
IP4: b0:b0:00:00:00:0a -> 02:fe:c3:bf:50:76
ICMP: 192.168.1.1 -> 192.168.1.2
tos 0x00, ttl 64, length 84, checksum 0xa5ff dscp CS0 ecn NON_ECN
fragment id 0x1156, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x4dd6 id 1
00:04:43:114782: l2-input
l2-input: sw_if_index 1 dst 02:fe:c3:bf:50:76 src b0:b0:00:00:00:0a [l2-fwd l2-flood l2-flood ]
00:04:43:114784: l2-fwd
l2-fwd: sw_if_index 1 dst 02:fe:c3:bf:50:76 src b0:b0:00:00:00:0a bd_index 1 result [0x1000000000003, 3] none
00:04:43:114786: l2-output
l2-output: sw_if_index 3 dst 02:fe:c3:bf:50:76 src b0:b0:00:00:00:0a data 81 00 00 0a 08 00 45 00 00 54 11 56
00:04:43:114792: tap10-output
tap10.10 flags 0x12180005
IP4: b0:b0:00:00:00:0a -> 02:fe:c3:bf:50:76 802.1q vlan 10
ICMP: 192.168.1.1 -> 192.168.1.2
tos 0x00, ttl 64, length 84, checksum 0xa5ff dscp CS0 ecn NON_ECN
fragment id 0x1156, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x4dd6 id 1
00:04:43:114794: tap10-tx
buffer 0x9f0aa: current data 0, length 102, buffer-pool 0, ref-count 1, trace handle 0x0
vlan-1-deep local l2-hdr-offset 0 l3-hdr-offset 18
hdr-sz 0 l2-hdr-offset 0 l3-hdr-offset 18 l4-hdr-offset 0 l4-hdr-sz 0
IP4: b0:b0:00:00:00:0a -> 02:fe:c3:bf:50:76 802.1q vlan 10
ICMP: 192.168.1.1 -> 192.168.1.2
tos 0x00, ttl 64, length 84, checksum 0xa5ff dscp CS0 ecn NON_ECN
fragment id 0x1156, flags DONT_FRAGMENT
ICMP echo_reply checksum 0x4dd6 id 1
通过上面的trace可以看到报文的vlan tag在node节点l2-input-vtr节点上剥除。再经过l2-learning 节点mac学习;l2-fwd查询mac表将报文转发到bvi10接口上面进行L3路由查询。l2-input-vtr节点就是命令行set interface l2 tag-rewrite tap10.10 pop 1 配置下发时使能的。关键业务处理逻辑判断代码如下:
代码语言:javascript复制### l2-input-vtr node节点处理逻辑
if (PREDICT_FALSE (config0->out_vtr_flag))
{
if (config0->output_vtr.push_and_pop_bytes)
{
/* perform the tag rewrite on two packets */
if (l2_vtr_process (b0, &config0->input_vtr))
{
/* Drop packet */
next0 = L2_INVTR_NEXT_DROP;
b0->error = node->errors[L2_INVTR_ERROR_DROP];
}
}
else if (config0->output_pbb_vtr.push_and_pop_bytes)
{
if (l2_pbb_process (b0, &(config0->input_pbb_vtr)))
{
/* Drop packet */
next0 = L2_INVTR_NEXT_DROP;
b0->error = node->errors[L2_INVTR_ERROR_DROP];
}
}
}
那么下行vlan节点添加是在哪个节点完成的呢?是在l2-output节点,通过查询接口vtr配置flag使能且存在push或者pop字节数判断是否增加vlan或者剥除vlan。具体代码如下:
代码语言:javascript复制 /* VTR */
if (config->out_vtr_flag && config->output_vtr.push_and_pop_bytes)
{
if (feature_bitmap & L2OUTPUT_FEAT_EFP_FILTER)
l2output_process_batch (vm, node, config, b, cdo, next, count,
/* l2_efp */ 1,
/* l2_vtr */ 1,
/* l2_pbb */ 0);
else
l2output_process_batch (vm, node, config, b, cdo, next, count,
/* l2_efp */ 0,
/* l2_vtr */ 1,
/* l2_pbb */ 0);
}
else if (config->out_vtr_flag &&
config->output_pbb_vtr.push_and_pop_bytes)
l2output_process_batch (vm, node, config, b, cdo, next, count,
/* l2_efp */ 0,
/* l2_vtr */ 0,
/* l2_pbb */ 1);
else
l2output_process_batch (vm, node, config, b, cdo, next, count,
/* l2_efp */ 0,
/* l2_vtr */ 0,
/* l2_pbb */ 0);
}
也就是说在设置tag-rewrite配置时,会使能l2-input-vtr node节点,同时生成intput和output的配置数据。下面是l2-input-vtr和l2-ouput节点配置结构体关系图:
至此我们简单了解了一些dot1q终结子接口的配置说明。VPP中L2相关处理时一个很复杂的模块。本文只是通过简单的例子了解l2转发逻辑。具体细节还需要深入阅读代码。
⚠️:文中L2层数据结构图已经放在个人github中,公众号回复资料可获取github仓库链接。
参考资料: 1、华为交换机dot1q终止vid命令详解: https://support.huawei.com/enterprise/zh/doc/EDOC1000128396/c51dda43 2、FD.io/VPP — L2 vSwitch配置学习: https://blog .51cto.com/u_15301988/5181050 3、vpp vlan学习: https://ipng.ch/s/articles/2022/02/14/vpp-vlan-gym.html