…at the beginning of this writing, my wish was to use this white paper to talk about CISCO ACI Multi POD solution, but suddenly I thought that it would have been much better to debate about it using a pragmatic approach; (too boring just talk about something that is already written somewhere) let’s give to it a spark! 🙂

Why not facing with the real world, real numbers, in order to have more a feeling of what Multi POD means if compared with the single POD implementation (do we want to avoid talking about his predecessor, the stretched solution, no)?

…however, to have a good start, I’m forced to introduce the basic concepts necessary to better understand what is following later on; let’s have a quick look at the 3 different scenarios:

 

 

 

The first consideration that rises up is: the number of fibers to be used, …just a few sec later, the costs that each choice involves. 🙁

 

 

Let’s suppose now that we already have on our pocket 40 BiDi (as present from Sant Claus J) and have to share with the Board of Directors the business case about the delta of $$$ to be invested ONLY for the optical (and new devices) that each solution imposes to be bought as extra:

Obviously, the previous ones are just rough prices but give an idea of which direction each solution is taking.

RTT < 50msec is the requirement that has to be satisfied for single Fabric geographically distributed solution, stretched or multi POD; for sure both of them save a lot of the expensive WSP-Q40GLR4L (<2Km) or QSFP-40G-LR4-S (<10Km) optical that should be used in the single POD solution in case the leaf and spines had to be geographically distributed over long distances.

Now let’s examine the problem from another point of view, pros and cons that the Multi POD scenario involves:

Pros:

  • Unique controller domain (one cluster made of 3 APIC + a 4th one as backup)
  • One Fabric but two different domains for ISIS, BGP, COOP, so eventual disaster on one of the two POD doesn’t have effect on the remaining one
  • Higher distance between the 2 PODs to be covered
  • Fibres necessary to interconnect 2 PODs are 8 and optical adapter 16, so limited if compared with the single POD

Cons:

  • # Hops to cross the fabric at most becomes 4 (we are considering end2end streaming between two endpoints distributed between the 2 PODs – no L3 services as Load Balancer or/and Firewall in the middle are considered that would increase the number of hops)
  • VTEP address space has to be separated between the two PODs
  • IPN nodes have to support: OSPF, PIM bidir, DHCP relay, MTU bigger and coherent with the one configured on Spines nodes, QoS

 

At this point is useless to spend more time to try to justify which reasons could bring to choose multi POD instead of stretched or even single POD scenario; just accept that the main topic of this white paper will be the Multi POD solution and about that we’ll deal with… by now 🙂

I think it could be interesting to introduce the specifications in summary that have to be satisfied in order to implement the Multi POD scenario (it is just an assumption :-):

  • Different VTEP address spaces for the 2 PODs (in our case we are talking about 10.1.8.0/21 for POD1 and 10.1.16.0/21 for POD2); pay attention when configure the 3rd APIC because even though it is in the POD2, in order to let it to be included in the cluster with the other two ones, it has to be assigned to the address space of 10.1.8.0/21 and not 10.1.16.0/21
  • MTU = 9150 configured overall on IPN and Spine nodes’ interfaces facing each other and that have to implement OSPF
  • Address space (private one in our case) assigned to the external TEP (ETEP) concerning IPN and Spine network infrastructure: 192.168.1.0/24 and 192.168.2.0/24 chosen each one for the different PODs where the IPN nodes are hosted; in these two ETEP address spaces, two ip anycast address have to be selected to be adopted as EVPN next-hop IP address for the interpod data-plane traffic (192.168.1.254/24 and 192.168.2.254/24)
  • DHCP relay functionality has to be implemented on IPN nodes pointing to APIC 1, 2 and 3 (10.1.8.1, 10.1.8.2 and 10.1.8.3 in our case) in order to allow the discovery of the remaining part of the second POD once all the devices have been powered-up
  • PIM bidir has to be configured on PIN nodes
  • mP_BGP EVPN has to be configured among the Spine nodes

 

Briefly, all that stuff allows to:

  • finalize the discovery of the spine, leaf and 3rd APIC on POD2
  • mutual redistribute the ISIS VTEP prefixes concerning the two PODs on the OSPF on external TEP and vice versa
  • realize the mP_BGP EVPN peering among the spine devices to mutual exchange the L2/L3 info concerning the endpoints attached to the two PODs

One thing that is important to quote is that the interfaces on IPN nodes that will be used to rise up the OSPF adjacency with Spine nodes (used to propagate the L0 of Spine used for mP_BGP peering) will actually be using the sub-interface “x.4” because that is strictly required by CISCO; accept it as a religion! 🙂

Before moving on with the detailed analysis of the Multi POD implementation, I’d like to present a quick excursus on protocols used to let ACI working in this scenario.

First of all, the IGP routing protocols used: ISIS and OSPF.

ISIS is necessary to let Leaves to know as to reach the other VTEPs spread among themselves (remember that VTEP address is the loopback used by each Leaf to terminate the VXLAN tunnel). OSPF is used to propagate the SPINEs Loopbacks (necessary for mP_BGP EVPN and L3VPN peering) and the VTEP address originally announced by ISIS (via mutual redistribution).

We have then COOP, used for the purpose of distributing endpoint information to Spine switches. Spine switches never use COOP to distribute end host information to leaf switches. The Spines record the information learned via COOP in the Global Proxy Table, and this information is used to resolve unknown destination MAC/IP addresses when traffic is sent to the Proxy address.

Finally, we have mP_BGP EVPN and mP_BGP L3VPN.

mP_BGP L3VPN is implemented between leaf and spine switches and among the Spines of two PODs, to propagate prefixes within the ACI fabric (in the Multi POD scenario, specifically, the IP prefixes that are propagated are, besides the ones concerning the external world, also of endpoint geographically distributed over the two PODs).

mP_BGP EVPN is implemented among the Spines of two PODs to propagate the L2 information (MAC address) of endpoint geographically distributed over the two PODs; each Spine in fact, collect L2 information of endpoint attached to fabric via COOP, then, in the Multi POD solution, propagate them to the remote Spines.

If you have been lucky (or smart, or both :-)), once everything has been configured, you should get something very close to this figure on ACI GUI:

 

 

 

 

 

It’s time now to start to play a little with the info we have got! 🙂 🙂

Starting from here, the topic is going to approach the Multi POD solution more in deep; it could be becoming a little boring and too much technical, up to you to decide to have a break and go outside for sunbathing or keep going with the reading!

At the moment of this writing, we don’t have anything else except APIC nodes connected to Leaves, so I’ll be using them to surf the technology is under ACI framework.

As you know the APIC are talking with ACI infrastructure in overlay-1 VRF that at L2 correspond to VLAN ID 3967 (defined at the moment of configuration of APIC node). That VLAN is used for the control plane of ACI infrastructure; on it ISIS, BGP, … everything concerning the control plane messages is flowing.

So, if we focus on APIC1 we see:

***-APIC1# ifconfig bond0.3967
bond0.3967: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1496
inet 10.1.8.1 netmask 255.255.255.255 broadcast 10.1.8.1
inet6 fe80::2be:75ff:fee0:90ec prefixlen 64 scopeid 0x20<link>
ether 00:be:75:e0:90:ec txqueuelen 1000 (Ethernet)
RX packets 417043110 bytes 65499106055 (61.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 399617309 bytes 88890450171 (82.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

…the same for APIC2:

***-APIC2# ifconfig bond0.3967
bond0.3967: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1496
inet 10.1.8.2 netmask 255.255.255.255 broadcast 10.1.8.2
inet6 fe80::2be:75ff:fee0:a0f0 prefixlen 64 scopeid 0x20<link>
ether 00:be:75:e0:a0:f0 txqueuelen 1000 (Ethernet)
RX packets 329928530 bytes 60456267590 (56.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 379560581 bytes 87266336828 (81.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

…and finally for APIC3:

***-APIC3# ifconfig bond0.3967
bond0.3967: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1496
inet 10.1.8.3 netmask 255.255.255.255 broadcast 10.1.8.3
inet6 fe80::2be:75ff:fee0:aabc prefixlen 64 scopeid 0x20<link>
ether 00:be:75:e0:aa:bc txqueuelen 1000 (Ethernet)
RX packets 20162673 bytes 8701267669 (8.1 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 17085681 bytes 4275621922 (3.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Keep in mind the MAC address associated to APIC3 because later on will be used for troubleshooting and to investigate what is going on inside the ACI solution.

Let’s move on; let’s have a look at ISIS adj, LLDP neighborship… inside the ACI fabric:

***-Leaf01# show isis adjacency vrf overlay-1
IS-IS process: isis_infra VRF:overlay-1
IS-IS adjacency database:
System ID SNPA Level State Hold Time Interface
410C.010A.0000 N/A 1 UP 00:00:58 Ethernet1/49.7
420C.010A.0000 N/A 1 UP 00:00:56 Ethernet1/50.8

Did you see? As said before, overlay-1 vrf is the master! 🙂 ISIS adj is realized on it! Another thing you should pay attention on, is the sub-interface used by Leaf01 to rise up the ISIS adj with its neighbors, it is changing on each p2p connection and managed automatically by ACI (we really don’t care about it).

410C.010A.0000 is the ISIS System ID of Spine-1 of POD1, Leaf01 is in neighboring with.

***-Spine01# show isis protocol vrf overlay-1
ISIS process : isis_infra
VRF: overlay-1
System ID : 41:0C:01:0A:00:00 IS-Type : L1
SAP : 412 Queue Handle : 14
Maximum LSP MTU: 1492
Metric-style : advertise(narrow, wide), accept(narrow, wide)
Area address(es) :
01
Process is up and running
Interfaces supported by IS-IS :
loopback0
Ethernet1/6.70
Ethernet1/4.68
Ethernet1/2.69
Ethernet1/1.72
Ethernet1/5.71
Ethernet1/3.67
loopback1
loopback2
loopback3
loopback4
loopback5
loopback6
loopback7
loopback8
loopback9
loopback10
loopback12
Address family IPv4 unicast :
Number of interface : 18
Adjacency check disabled
Distance : 115

Just as confirmation, in case you don’t believe me:

***-Leaf01# show lldp neighbors
Capability codes:
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID Local Intf Hold-time Capability Port ID
***-APIC1 Eth1/48 120 eth2-1
***-Spine01 Eth1/49 120 BR Eth1/1
***-Spine02 Eth1/50 120 BR Eth1/1
Total entries displayed: 3

And now, one of the most important commands of ACI implementation, that let you know how ACI maps internally the VLAN ID that you just configured some minutes before for APIC; in this case you can see that the 3967 used by APIC to talk with the rest of the ACI world, has been mapped internally as 9, that means that APIC1 is connected to ACI infrastructure at Leaf01 via Eth1/48.9 also named the infra VLAN at level3 mapped into overlay-1 VRF.

***-Leaf01# show vlan extended

VLAN Name Status Ports
—- ——————————– ——— ——————————-
9 infra:default active Eth1/48

VLAN Type Vlan-mode Encap
—- —– ———- ——————————-
9 enet CE vxlan-16777209, vlan-3967

Another interesting thing that is peculiar of ACI implementation of VXLAN paradigm is the IP address assigned automatically to the p2p links between Leaf and Spine; you would have expecting to find all /30 ip address, I’m sorry but as you can see, Leaf01 (and Spine as well) are taking as IP address for all their p2p ISIS links, their Loopback L0 (keep in mind for, when, later, we’ll analyze the ISIS protocol routing).

***-Leaf01# show ip interface brief vrf overlay-1
IP Interface Status for VRF “overlay-1″(4)
Interface Address Interface Status
eth1/49 unassigned protocol-up/link-up/admin-up
eth1/49.7 unnumbered protocol-up/link-up/admin-up
(loO)
eth1/50 unassigned protocol-up/link-up/admin-up
eth1/50.8 unnumbered protocol-up/link-up/admin-up
(loO)

vlan9 10.1.8.30/27 protocol-up/link-up/admin-up

***-Leaf02# show ip interface brief vrf overlay-1
IP Interface Status for VRF “overlay-1″(4)
Interface Address Interface Status
eth1/49 unassigned protocol-up/link-up/admin-up
eth1/49.8 unnumbered protocol-up/link-up/admin-up
(loO)
eth1/50 unassigned protocol-up/link-up/admin-up
eth1/50.7 unnumbered protocol-up/link-up/admin-up
(loO)

vlan9 10.1.8.30/27 protocol-up/link-up/admin-up

And what about the infra VLAN ID on POD2? As expected, ACI mapped the 3967 on …another VLAN ID, 🙂 this time the VLAN ID 13, no more 9 🙂 … very helpful for troubleshooting, isn’t it?

***-Leaf03# show vlan extended

VLAN Name Status Ports
—- ——————————– ——— ——————————-
13 infra:default active Eth1/48

VLAN Type Vlan-mode Encap
—- —– ———- ——————————-
13 enet CE vxlan-16777209, vlan-3967

But, but, wait a moment, we are checking all that in SSH on the Leaf03 of POD2 and that configuration has been applied by ACI, so it means that the multi POD already is working, the 3 APICs are in clustering, the VLAN ID 13 has been assigned to the sub-interface towards the APIC3… mmhhh, interesting, the two PODs are already talking each other and the same language 🙂 and configuring each other via the central brain, the distributed solution APIC! Cool! 🙂

***-Leaf03# show lldp nei
Capability codes:
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID Local Intf Hold-time Capability Port ID
ICCA-APIC3 Eth1/48 120 eth2-1
ICCA-Spine01.net.***.it Eth1/49 120 BR Eth1/3
ICCA-Spine02.net.***.it Eth1/50 120 BR Eth1/3

 

So, we have the 10.1.8.30/27 ip address that has been assigned by ACI to be part of Tenant infra used by APIC nodes; its scope is “Private to VRF” that means that is not announced to anyone, and we like that, because no one externally has to know about the APIC IP internal addresses… but who is SVI 13 (SVI 9 on POD1) with the ip address 10.1.8.30/27?
It is the ip anycast gateway address for the subnet 10.1.8.0/27 used by APIC nodes on VLAN originally configured as 3967 on vrf overlay-1 and spread over the whole ACI fabric shared by the two PODs. That means that for each APIC node, the 10.1.8.30/27 represents the default gateway even though we change the Leaf where the APIC is connected to.

***-Leaf03# show ip interface brief vrf overlay-1
IP Interface Status for VRF “overlay-1″(4)
Interface Address Interface Status
eth1/49 unassigned protocol-up/link-up/admin-up
eth1/49.11 unnumbered protocol-up/link-up/admin-up
(lo0)
eth1/50 unassigned protocol-up/link-up/admin-up
eth1/50.12 unnumbered protocol-up/link-up/admin-up
(lo0)

vlan13 10.1.8.30/27 protocol-up/link-up/admin-up

Now it’s time to move on and focus on ISIS.

Let’s examine the ISIS routing table on Leaf01 of POD1; what we’ll see is that the prefixes that appear are the ones concerning the ISIS network infrastructure of both the PODs assigned automatically by ACI and taken from VTEP address spaces 10.1.8.0/21 and 10.1.16.0/21, but also the address space concerning the External TEP, 192.168.1.0/24 and 192.168.2.0/24.
The mutual redistribution between OSPF and ISIS is working, and each Leaf on the ACI implementation, knows the IP address of the Loopbacks of the remaining Leaves of the same and remote POD (that is important because the VXLAN L3 tunnels are made using properly the Loopback of Leaf as VTEP; in this manner, the L2 stretching is spread over the multi POD solution, and the VXLAN encapsulation makes it transparent independently by the fact that we are talking about single POD or Multi POD scenario).

***-Leaf01# show ip route vrf overlay-1
IP Route Table for VRF “overlay-1”
‘*’ denotes best ucast next-hop
‘**’ denotes best mcast next-hop
‘[x/y]’ denotes [preference/metric]
‘%<string>’ in via output denotes VRF <string>

10.1.8.0/27, ubest/mbest: 1/0, attached, direct
*via 10.1.8.30, vlan9, [1/0], 12w84d, direct
10.1.8.2/32, ubest/mbest: 2/0
*via 10.1.12.65, eth1/49.7, [115/12], 12w84d, isis-isis_infra, L1
*via 10.1.12.66, eth1/50.8, [115/12], 12w84d, isis-isis_infra, L1
10.1.8.3/32, ubest/mbest: 2/0
*via 10.1.12.65, eth1/49.7, [115/11], 2d02h, isis-isis_infra, L1
*via 10.1.12.66, eth1/50.8, [115/11], 2d02h, isis-isis_infra, L1
10.1.8.30/32, ubest/mbest: 1/0, attached
*via 10.1.8.30, vlan9, [1/0], 12w84d, local, local
10.1.8.32/32, ubest/mbest: 2/0, attached, direct
*via 10.1.8.32, lo1023, [1/0], 12w84d, local, local
*via 10.1.8.32, lo1023, [1/0], 12w84d, direct

10.1.8.193/32, ubest/mbest: 2/0
*via 10.1.12.65, eth1/49.7, [115/2], 12w84d, isis-isis_infra, L1
*via 10.1.12.66, eth1/50.8, [115/2], 12w84d, isis-isis_infra, L1

10.1.12.198/32, ubest/mbest: 1/0
*via 10.1.12.65, eth1/49.7, [115/2], 12w84d, isis-isis_infra, L1
10.1.12.199/32, ubest/mbest: 1/0
*via 10.1.12.65, eth1/49.7, [115/2], 12w84d, isis-isis_infra, L1
10.1.16.0/21, ubest/mbest: 2/0
*via 10.1.12.66, eth1/50.8, [115/64], 3d00h, isis-isis_infra, L1
*via 10.1.12.65, eth1/49.7, [115/64], 3d00h, isis-isis_infra, L1
192.168.1.0/30, ubest/mbest: 1/0
*via 10.1.12.66, eth1/50.8, [115/64], 3d20h, isis-isis_infra, L1
192.168.1.4/30, ubest/mbest: 1/0
*via 10.1.12.65, eth1/49.7, [115/64], 3d20h, isis-isis_infra, L1

192.168.2.4/30, ubest/mbest: 2/0
*via 10.1.12.66, eth1/50.8, [115/64], 3d00h, isis-isis_infra, L1
*via 10.1.12.65, eth1/49.7, [115/64], 3d00h, isis-isis_infra, L1
192.168.2.8/30, ubest/mbest: 2/0
*via 10.1.12.65, eth1/49.7, [115/64], 3d00h, isis-isis_infra, L1
*via 10.1.12.66, eth1/50.8, [115/64], 3d00h, isis-isis_infra, L1
192.168.2.12/30, ubest/mbest: 2/0
*via 10.1.12.65, eth1/49.7, [115/64], 3d00h, isis-isis_infra, L1
*via 10.1.12.66, eth1/50.8, [115/64], 3d00h, isis-isis_infra, L1
192.168.2.101/32, ubest/mbest: 2/0
*via 10.1.12.66, eth1/50.8, [115/64], 3d00h, isis-isis_infra, L1
*via 10.1.12.65, eth1/49.7, [115/64], 3d00h, isis-isis_infra, L1

If now, just for curiosity, we check the next hop used by ISIS to reach the APIC3 from POD1, and in particular from Leaf01, we’ll see that:

***-Leaf01# show ip route vrf overlay-1 | grep -A 2 10.1.8.3/32
10.1.8.3/32, ubest/mbest: 2/0
*via 10.1.12.65, eth1/49.7, [115/11], 2d02h, isis-isis_infra, L1
*via 10.1.12.66, eth1/50.8, [115/11], 2d02h, isis-isis_infra, L1

Who are 10.1.12.65 and 10.1.12.66? Just the loopback0 of SPINE nodes, so, as forewarned, the NH address are not the /30 (that don’t exist, btw) of p2p links but the loopbacks.

Let’s change for a while the scope of our “ACI surfing” and let’s analyze the ARP table of Leaf.

***-Leaf01# show ip arp vrf overlay-1

Flags: * – Adjacencies learnt on non-active FHRP router
+ – Adjacencies synced via CFSoE
# – Adjacencies Throttled for Glean
D – Static Adjacencies attached to down interface

IP ARP Table for context overlay-1
Total number of entries: 1
Address Age MAC Address Interface
10.1.8.100be.75e0.90ec vlan9

***-Leaf03# show ip arp vrf overlay-1

Flags: * – Adjacencies learnt on non-active FHRP router
+ – Adjacencies synced via CFSoE
# – Adjacencies Throttled for Glean
D – Static Adjacencies attached to down interface

IP ARP Table for context overlay-1
Total number of entries: 1
Address Age MAC Address Interface
10.1.8.300be.75e0.aabc vlan13

***-Leaf03# show mac address-table dynamic interface eth1/48
Legend:
* – primary entry, G – Gateway MAC, (R) – Routed MAC, O – Overlay MAC
age – seconds since last seen,+ – primary entry using vPC Peer-Link,
(T) – True, (F) – False
VLAN MAC Address Type age Secure NTFY Ports/SWID.SSID.LID
———+—————–+——–+———+——+—-+——————
* 13 00be.75e0.aabc dynamic – F F eth1/48

***-Leaf03# show endpoint interface ethernet 1/48
Legend:
s – arp O – peer-attached a – local-aged S – static
V – vpc-attached p – peer-aged M – span L – local
B – bounce H – vtep
+———————————–+—————+—————–+————–+————-+
VLAN/ Encap MAC Address MAC Info/ Interface
Domain VLAN IP Address IP Info
+———————————–+—————+—————–+————–+————-+
13/overlay-1 vxlan-16777209 00be.75e0.aabc L eth1/48
+——————————————————————————+
Endpoint Summary
+——————————————————————————+
Total number of Local Endpoints : 1
Total number of Remote Endpoints : 0
Total number of Peer Endpoints : 0
Total number of vPC Endpoints : 0
Total number of non-vPC Endpoints : 1
Total number of MACs : 1
Total number of VTEPs : 0
Total number of Local IPs : 0
Total number of Remote IPs : 0
Total number All EPs : 1

All the previous show commands confirm what already said about the VLAN internal mapping of the original VLAN ID 3967 and the binding of L2 and L3 information concerning the APIC interfaces with the overlay-1 VRF.

Do you want to have a look outside the box and check what is happening among IPN nodes and Spine?

***-IPN01# show lldp neighbors
Capability codes:
(R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
(W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID Local Intf Hold-time Capability Port ID
***-Spine01 Eth1/49 120 BR Eth1/63
***-Spine02 Eth1/50 120 BR Eth1/63
***-Spine01.net.***.it
Eth1/51 120 BR Eth1/63
***-Spine02.net.***.it
Eth1/52 120 BR Eth1/63
***-IPN01 Eth1/53 120 BR Ethernet1/53
***-IPN01 Eth1/54 120 BR Ethernet1/54

Each IPN node has a LLDP neighborship with the local and remote SPINE nodes and the remote IPN node, totally 6 for redundancy.

As you can see, also in terms of OSPF, there are 6 OSPF adj (still in VRF, this time, called IPN and configured manually on Nexus using the .4 VLAN ID as sub-interface).

***-IPN01# show ip ospf neighbors vrf IPN
OSPF Process ID MULTIPOD VRF IPN
Total number of neighbors: 6
Neighbor ID Pri State Up Time Address Interface
192.168.1.101 1 FULL/ – 3d22h 192.168.1.2 Eth1/49.4
192.168.1.102 1 FULL/ – 3d22h 192.168.1.6 Eth1/50.4
192.168.2.101 1 FULL/ – 3d02h 192.168.2.2 Eth1/51.4
192.168.2.102 1 FULL/ – 3d03h 192.168.2.6 Eth1/52.4
192.168.0.102 1 FULL/ – 4d00h 192.168.0.2 Eth1/53.4
192.168.0.102 1 FULL/ – 4d00h 192.168.0.6 Eth1/54.4

Concerning the SPINE01 of POD2, we have:

***-IPN01# show ip ospf neighbors vrf IPN
OSPF Process ID MULTIPOD VRF IPN
Total number of neighbors: 6
Neighbor ID Pri State Up Time Address Interface
192.168.1.101 1 FULL/ – 4d01h 192.168.1.10 Eth1/49.4
192.168.1.102 1 FULL/ – 4d01h 192.168.1.14 Eth1/50.4
192.168.2.101 1 FULL/ – 3d05h 192.168.2.10 Eth1/51.4
192.168.2.102 1 FULL/ – 3d06h 192.168.2.14 Eth1/52.4
192.168.0.101 1 FULL/ – 4d03h 192.168.0.1 Eth1/53.4
192.168.0.101 1 FULL/ – 4d03h 192.168.0.5 Eth1/54.4

***-IPN01# sh ip route ospf-MULTIPOD vrf IPN
IP Route Table for VRF “IPN”
‘*’ denotes best ucast next-hop
‘**’ denotes best mcast next-hop
‘[x/y]’ denotes [preference/metric]
‘%<string>’ in via output denotes VRF <string>

10.1.8.0/21, ubest/mbest: 2/0
*via 192.168.1.10, Eth1/49.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
*via 192.168.1.14, Eth1/50.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
10.1.8.1/32, ubest/mbest: 2/0
*via 192.168.1.10, Eth1/49.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
*via 192.168.1.14, Eth1/50.4, [110/20], 4d01h, ospf-MULTIPOD, type-2

*via 192.168.1.10, Eth1/49.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
*via 192.168.1.14, Eth1/50.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
10.1.12.65/32, ubest/mbest: 1/0
*via 192.168.1.10, Eth1/49.4, [110/2], 4d01h, ospf-MULTIPOD, intra
10.1.12.66/32, ubest/mbest: 1/0
*via 192.168.1.14, Eth1/50.4, [110/2], 4d01h, ospf-MULTIPOD, intra
10.1.16.0/21, ubest/mbest: 2/0
*via 192.168.2.10, Eth1/51.4, [110/20], 3d04h, ospf-MULTIPOD, type-2
*via 192.168.2.14, Eth1/52.4, [110/20], 3d04h, ospf-MULTIPOD, type-2

10.1.21.0/32, ubest/mbest: 1/0
*via 192.168.2.10, Eth1/51.4, [110/2], 3d04h, ospf-MULTIPOD, intra
10.1.21.1/32, ubest/mbest: 1/0
*via 192.168.2.14, Eth1/52.4, [110/2], 3d05h, ospf-MULTIPOD, intra
192.168.0.101/32, ubest/mbest: 2/0
*via 192.168.0.1, Eth1/53.4, [110/2], 4d02h, ospf-MULTIPOD, intra
*via 192.168.0.5, Eth1/54.4, [110/2], 4d02h, ospf-MULTIPOD, intra
192.168.1.0/30, ubest/mbest: 3/0
*via 192.168.0.1, Eth1/53.4, [110/2], 4d01h, ospf-MULTIPOD, intra
*via 192.168.0.5, Eth1/54.4, [110/2], 4d01h, ospf-MULTIPOD, intra
*via 192.168.1.10, Eth1/49.4, [110/2], 4d01h, ospf-MULTIPOD, intra
192.168.1.4/30, ubest/mbest: 3/0
*via 192.168.0.1, Eth1/53.4, [110/2], 4d01h, ospf-MULTIPOD, intra
*via 192.168.0.5, Eth1/54.4, [110/2], 4d01h, ospf-MULTIPOD, intra
*via 192.168.1.14, Eth1/50.4, [110/2], 4d01h, ospf-MULTIPOD, intra
192.168.1.101/32, ubest/mbest: 1/0
*via 192.168.1.10, Eth1/49.4, [110/2], 4d01h, ospf-MULTIPOD, intra
192.168.1.102/32, ubest/mbest: 1/0
*via 192.168.1.14, Eth1/50.4, [110/2], 4d01h, ospf-MULTIPOD, intra
192.168.1.254/32, ubest/mbest: 2/0
*via 192.168.1.10, Eth1/49.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
*via 192.168.1.14, Eth1/50.4, [110/20], 4d01h, ospf-MULTIPOD, type-2
192.168.2.0/30, ubest/mbest: 3/0
*via 192.168.0.1, Eth1/53.4, [110/2], 3d04h, ospf-MULTIPOD, intra
*via 192.168.0.5, Eth1/54.4, [110/2], 3d04h, ospf-MULTIPOD, intra
*via 192.168.2.10, Eth1/51.4, [110/2], 3d04h, ospf-MULTIPOD, intra

192.168.2.254/32, ubest/mbest: 2/0
*via 192.168.2.10, Eth1/51.4, [110/20], 3d04h, ospf-MULTIPOD, type-2
*via 192.168.2.14, Eth1/52.4, [110/20], 3d04h, ospf-MULTIPOD, type-2

…where some prefixes are seen as intra (internal OSPF), other ones as Type-2 and however are the result of the mutual redistribution with ISIS.

In order to have a clearer view of intra nature and EXT type-2 of prefixes, let’s examine the OSPF link state database:

***-IPN01# show ip ospf database vrf IPN
OSPF Router with ID (192.168.0.102) (Process ID MULTIPOD VRF IPN)

Router Link States (Area 0.0.0.0)

Link ID ADV Router Age Seq# Checksum Link Count
192.168.0.101 192.168.0.101 1017 0x800000dd 0x7ffc 14
192.168.0.102 192.168.0.102 966 0x800000ef 0xed39 14
192.168.1.101 192.168.1.101 616 0x800000c9 0xe631 6
192.168.1.102 192.168.1.102 1775 0x800000ca 0x58aa 6
192.168.2.101 192.168.2.101 1607 0x800000a8 0xc5a4 6
192.168.2.102 192.168.2.102 548 0x800000a3 0x4318 6

Type-5 AS External Link States

Link ID ADV Router Age Seq# Checksum Tag
10.1.8.0 192.168.1.101 1786 0x800000c3 0xa877 0
10.1.8.0 192.168.1.102 1775 0x800000c2 0xa47b 0
10.1.8.1 192.168.1.101 1786 0x800000c3 0xc156 0
10.1.8.1 192.168.1.102 1775 0x800000c2 0xbd5a 0
10.1.8.2 192.168.1.101 1786 0x800000c3 0xb75f 0
10.1.8.2 192.168.1.102 1775 0x800000c2 0xb363 0
10.1.8.3 192.168.2.101 1337 0x8000006f 0x4f1a 0
10.1.8.3 192.168.2.102 1338 0x8000006f 0x491f 0
10.1.8.33 192.168.1.101 1786 0x800000c3 0x8077 0
10.1.8.33 192.168.1.102 1775 0x800000c2 0x7c7b 0
10.1.8.34 192.168.1.101 16 0x800000c3 0x7680 0
10.1.8.34 192.168.1.102 1775 0x800000c2 0x7284 0
10.1.8.35 192.168.1.101 16 0x800000c3 0x6c89 0
10.1.8.35 192.168.1.102 1775 0x800000c2 0x688d 0
10.1.16.0 192.168.2.101 556 0x8000009c 0x97a6 0
10.1.16.0 192.168.2.102 548 0x8000009c 0x91ab 0
10.1.16.33 192.168.2.101 556 0x8000009c 0x6fa6 0
10.1.16.33 192.168.2.102 548 0x8000009c 0x69ab 0
10.1.16.34 192.168.2.101 556 0x8000009c 0x65af 0
10.1.16.34 192.168.2.102 548 0x8000009c 0x5fb4 0
10.1.16.35 192.168.2.101 556 0x8000009c 0x5bb8 0
10.1.16.35 192.168.2.102 548 0x8000009c 0x55bd 0
192.168.1.101 192.168.1.102 1775 0x800000c2 0xfb60 0
192.168.1.102 192.168.1.101 546 0x800000c2 0xf764 0
192.168.1.254 192.168.1.101 1786 0x800000c3 0xffc2 0
192.168.1.254 192.168.1.102 1775 0x800000c2 0xfbc6 0
192.168.2.101 192.168.2.102 548 0x8000009c 0x364a 0
192.168.2.102 192.168.2.101 556 0x8000009c 0x324e 0
192.168.2.254 192.168.2.101 556 0x8000009c 0x3cab 0
192.168.2.254 192.168.2.102 548 0x8000009c 0x36b0 0

So, briefly, OSPF is up and running between IPN and SPINE nodes, loopbacks of those ones are announced reciprocally in OSPF and mP_BGP EVPN can be put in place … but before let’s check the mP_BGP VPNv4 used internally each POD between the SPINE, acting as router reflector and the Leaf acting as …PE??? …of the traditional world borrowed by ISP environment! 🙂

Concerning the SPINE01 of POD1, we have:

***-Spine01# show bgp vpnv4 unicast summary vrf overlay-1
BGP summary information for VRF overlay-1, address family VPNv4 Unicast
BGP router identifier 192.168.1.101, local AS number 65000
BGP table version is 15, VPNv4 Unicast config peers 9, capable peers 8
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.1.12.64 4 65000 4308 4308 15 0 0 2d23h 0
10.1.12.67 4 65000 4308 4306 15 0 0 2d23h 0
10.1.12.68 4 65000 4306 4308 15 0 0 2d23h 0
10.1.12.69 4 65000 4308 4308 15 0 0 2d23h 0
10.1.12.70 4 65000 4308 4308 15 0 0 2d23h 0
10.1.12.71 4 65000 4306 4308 15 0 0 2d23h 0
192.168.2.101 4 65000 4310 4306 15 0 0 2d23h 0
192.168.2.102 4 65000 4310 4306 15 0 0 2d23h 0

As expected, we have 6 mP_BGP VPNv4 sessions with the 6 Leaf of POD1 and 2 mP_BGP VPNv4 sessions with the remote SPINE nodes of POD2.

Concerning the SPINE01 of POD2, we have:

***-Spine01# show bgp vpnv4 unicast summary vrf overlay-1
BGP summary information for VRF overlay-1, address family VPNv4 Unicast
BGP router identifier 192.168.2.101, local AS number 65000
BGP table version is 16, VPNv4 Unicast config peers 7, capable peers 6
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.1.21.2 4 65000 4225 4225 16 0 0 2d22h 0
10.1.21.3 4 65000 4311 4307 16 0 0 2d23h 0
10.1.21.4 4 65000 4305 4307 16 0 0 2d23h 0
10.1.21.5 4 65000 4226 4224 16 0 0 2d22h 0
192.168.1.101 4 65000 4305 4308 16 0 0 2d23h 0
192.168.1.102 4 65000 4305 4308 16 0 0 2d23h 0

As expected, we have 4 mP_BGP VPNv4 sessions with the 4 Leaf of POD2 and 2 mP_BGP VPNv4 sessions with the remote SPINE nodes of POD1.

From a pure L3 point of view, the L3PVN framework imported internally the DC infrastructure, is ready to work; Leaf of both the POD are in mP_BGP L3VPN peering through the SPINE nodes acting as Router Reflector and eventual VRFs configured on ACI Leaf access switch (or router at this point? 🙂 ) can be put in communication over the multi POD solution.

SPINE nodes are however, realizing also mP_BGP EVPN peering among them in order to propagate information concerning the L2 layer, MAC addresses of endpoint connected to Leaf on both PODs.

Concerning the SPINE01 of POD1, we have:

***-Spine01# show bgp l2vpn evpn summary vrf overlay-1
BGP summary information for VRF overlay-1, address family L2VPN EVPN
BGP router identifier 192.168.1.101, local AS number 65000
BGP table version is 47, L2VPN EVPN config peers 2, capable peers 2
18 network entries and 23 paths using 3744 bytes of memory
BGP attribute entries [3/432], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.2.101 4 65000 4314 4310 47 0 0 2d23h 5
192.168.2.102 4 65000 4314 4310 47 0 0 2d23h 5

Concerning the SPINE01 of POD2, we have:

***-Spine01# show bgp l2vpn evpn summary vrf overlay-1
BGP summary information for VRF overlay-1, address family L2VPN EVPN
BGP router identifier 192.168.2.101, local AS number 65000
BGP table version is 45, L2VPN EVPN config peers 2, capable peers 2
21 network entries and 29 paths using 4368 bytes of memory
BGP attribute entries [3/432], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.1.101 4 65000 4311 4314 45 0 0 2d23h 8
192.168.1.102 4 65000 4311 4314 45 0 0 2d23h 8

Finally, it’s time to see as the MAC address of APIC3 is propagated from POD2 to POD1 through mP_BGP EVPN configured among the SPINE nodes of two PODs. As already said, the IP anycast address 192.168.1.254 and 192.168.2.254 configured for the two couple of SPINEs will be used as next hop… be patient, we are going to see it in a few sec! 🙂

Concerning the SPINE01 of POD2, we have:

***-Spine01# show bgp l2vpn evpn vrf overlay-1
BGP routing table information for VRF overlay-1, address family L2VPN EVPN
BGP table version is 45, local router ID is 192.168.2.101
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i – IGP, e – EGP, ? – incomplete, | – multipath, & – backup

Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.1.254:65000
*>i[2]:[0]:[16777209]:[48]:[00be.75e0.90ec]:[0]:[0.0.0.0]/216
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777209]:[48]:[00be.75e0.a0f0]:[0]:[0.0.0.0]/216
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.64]/272
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.67]/272
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.68]/272
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.69]/272
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.70]/272
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.71]/272
192.168.1.254 100 0 i
* i 192.168.1.254 100 0 i

Route Distinguisher: 192.168.2.254:65000 (L2VNI 1)
*>i[2]:[0]:[16777209]:[48]:[00be.75e0.90ec]:[0]:[0.0.0.0]/216
192.168.1.254 100 0 i
*>i[2]:[0]:[16777209]:[48]:[00be.75e0.a0f0]:[0]:[0.0.0.0]/216
192.168.1.254 100 0 i
*>l[2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/216
192.168.2.254 100 32768 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.64]/272
192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.67]/272
192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.68]/272
192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.69]/272
192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.70]/272
192.168.1.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.71]/272
192.168.1.254 100 0 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.2]/272
192.168.2.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.3]/272
192.168.2.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.4]/272
192.168.2.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.5]/272
192.168.2.254 100 32768 i

Concerning the SPINE01 of POD1, we have:

***-Spine01# show bgp l2vpn evpn vrf overlay-1
BGP routing table information for VRF overlay-1, address family L2VPN EVPN
BGP table version is 47, local router ID is 192.168.1.101
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i – IGP, e – EGP, ? – incomplete, | – multipath, & – backup

Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.1.254:65000 (L2VNI 1)
*>l[2]:[0]:[16777209]:[48]:[00be.75e0.90ec]:[0]:[0.0.0.0]/216
192.168.1.254 100 32768 i
*>l[2]:[0]:[16777209]:[48]:[00be.75e0.a0f0]:[0]:[0.0.0.0]/216
192.168.1.254 100 32768 i
*>i[2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/216
192.168.2.254 100 0 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.64]/272
192.168.1.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.67]/272
192.168.1.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.68]/272
192.168.1.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.69]/272
192.168.1.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.70]/272
192.168.1.254 100 32768 i
*>l[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.12.71]/272
192.168.1.254 100 32768 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.2]/272
192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.3]/272
192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.4]/272
192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.5]/272
192.168.2.254 100 0 i

Route Distinguisher: 192.168.2.254:65000
*>i[2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/216
192.168.2.254 100 0 i
* i 192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.2]/272
192.168.2.254 100 0 i
* i 192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.3]/272
192.168.2.254 100 0 i
* i 192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.4]/272
192.168.2.254 100 0 i
* i 192.168.2.254 100 0 i
*>i[2]:[0]:[16777199]:[48]:[0200.0000.0001]:[32]:[10.1.21.5]/272
192.168.2.254 100 0 i
* i 192.168.2.254 100 0 i

As already well know from the ISP L3VPN environment, the APIC3’s MAC address is propagated via mP_BGP EVPN from SPINEs of POD2 to SPINEs of POD1 seen on the first ones with the Route Distinguisher: 192.168.2.254:65000, seen on the last ones into the local VRF with the Route Distinguisher: 192.168.1.254:65000; the next hop used is the ip anycast address, in this example 192.168.2.254, concerning the couple of SPINE nodes that are generating the mP_BGP EVPN update.

Let’s try now to follow the path that the APIC3 MAC address entry takes to be propagated from POD2 to POD1 leaves.
As already seen previously, the APIC3 MAC address is seen by Leaf03 of POD2 via Eth1/48.13 interface (13 remember, was the infra VLAN of overlay-1 VRF).

***-Leaf03# show endpoint
Legend:
s – arp O – peer-attached a – local-aged S – static
V – vpc-attached p – peer-aged M – span L – local
B – bounce H – vtep
+———————————–+—————+—————–+————–+————-+
VLAN/ Encap MAC Address MAC Info/ Interface
Domain VLAN IP Address IP Info
+———————————–+—————+—————–+————–+————-+
overlay-1 10.1.21.5 L lo0
13/overlay-1 vxlan-16777209 00be.75e0.aabc L eth1/48

Concerning the SPINE01 of POD2, we have:

***-Spine01# show bgp l2vpn evpn 00be.75e0.aabc vrf overlay-1
Route Distinguisher: 192.168.2.254:65000 (L2VNI 1)
BGP routing table entry for [2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/216, version 45 dest ptr 0xaa976f0e
Paths: (1 available, best #1)
Flags: (0x000102 00000000) on xmit-list, is not in rib/evpn
Multipath: eBGP iBGP

Advertised path-id 1
Path type: local 0x4000008c 0x0 ref 0, path is valid, is best path
AS-Path: NONE, path locally originated
192.168.2.254 (metric 0) from 0.0.0.0 (192.168.2.101)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 16777209
Extcommunity:
RT:5:16

Path-id 1 advertised to peers:
192.168.1.101 192.168.1.102

Concerning the SPINE01 of POD1, we have:

***-Spine01# show bgp l2vpn evpn 00be.75e0.aabc vrf overlay-1
Route Distinguisher: 192.168.1.254:65000 (L2VNI 1)
BGP routing table entry for [2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/216, version 94 dest ptr 0xaa91a0dc
Paths: (1 available, best #1)
Flags: (0x00021a 0x000009) on xmit-list, is in rib/evpn, is in l2rib mpod shard, is in l2rib
Multipath: eBGP iBGP

Advertised path-id 1
Path type: internal 0xc0000018 0x40 ref 56506, path is valid, is best path
Imported from 192.168.2.254:65000:[2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/112
AS-Path: NONE, path sourced internal to AS
192.168.2.254 (metric 20) from 192.168.2.101 (192.168.2.101)
Origin IGP, MED not set, localpref 100, weight 0
Received label 16777209
Received path-id 1
Extcommunity:
RT:5:16
ENCAP:8
Router MAC:0200.c0a8.02fe

Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.2.254:65000
BGP routing table entry for [2]:[0]:[16777209]:[48]:[00be.75e0.aabc]:[0]:[0.0.0.0]/216, version 89 dest ptr 0xaa91a176
Paths: (2 available, best #2)
Flags: (0x000202 00000000) on xmit-list, is not in rib/evpn, is locked
Multipath: eBGP iBGP

Path type: internal 0x40000018 0x2040 ref 0, path is valid, not best reason: Router Id
AS-Path: NONE, path sourced internal to AS
192.168.2.254 (metric 20) from 192.168.2.102 (192.168.2.102)
Origin IGP, MED not set, localpref 100, weight 0
Received label 16777209
Received path-id 1
Extcommunity:
RT:5:16
ENCAP:8
Router MAC:0200.c0a8.02fe

Advertised path-id 1
Path type: internal 0x40000018 0x2040 ref 1, path is valid, is best path
AS-Path: NONE, path sourced internal to AS
192.168.2.254 (metric 20) from 192.168.2.101 (192.168.2.101)
Origin IGP, MED not set, localpref 100, weight 0
Received label 16777209
Received path-id 1
Extcommunity:
RT:5:16
ENCAP:8
Router MAC:0200.c0a8.02fe

Path-id 1 not advertised to any peer

So, the entry 00be.75e0.aabc (APIC3’s MAC address), is learnt by Leaf03, propagated to Spine of POD2 via COOP; that entry is then propagated via mP_BGP EVPN to Spines of POD1. Being two the Spines on POD2, each one is propagating that entry to the remaining Spines of POD1, that will import with Route Distinguisher: 192.168.1.254:65000 (L2VNI 1) the best one from the two entries received. The choice of the best one, is due to BGP best path algorithm selection that in this scenario let prefer the Spine with the lowest Router ID, i.e. 192.168.2.101.

Take in mind this figure to understand the outputs:

 

We are at the end of this novel :-), and as it happens in all novels, at the end there is a summary of the story:

Initially, each leaf switch knows nothing about the remote endpoints. This behavior is normal and expected. However, each spine switch in both pods knows about the endpoints through mP_BGP EVPN and populates their COOP databases accordingly.

If a virtual machine in POD1 wants to communicate with a virtual machine in POD2, the endpoint destination is unknown to the leaf switch. This behavior is expected and normal, and the leaf switch directs traffic to its local spine proxy switch. The local spine switch sees the remote endpoint learned by mP_BGP EVPN in its COOP database. It then sets the DIPO to the anycast proxy address of the remote pod spine switch (in our case before it was 192.168.2.254). The traffic is redirected to the IPN because the anycast TEP next hop will be known through the OSPF peering. Note that traffic never is sent from a spine proxy switch in one POD to a leaf switch in a different POD.

The remote spine switch receives traffic and determines whether the inner destination endpoint is local. It then sends the traffic to the appropriate leaf switch. During this process, the source endpoint and the source Physical TEP (PTEP) or leaf switch are learned from the traffic flow. With this information, a dynamic tunnel is created from the POD2 leaf switch to the POD1 leaf switch for reverse traffic to use. The reverse traffic will build a complete dynamic tunnel between the two VTEPs or leaf switches. From that point onward, the two endpoints’ communication will be encapsulated leaf to leaf (VTEP to VTEP). The dynamic tunnels, as with normal tunnels in the fabric, are kept alive as long as there is communication between endpoints.