Cisco 6509 NAT Investigation Notes
2012-1-11 Test Matrix
TestName |
cmd |
Server |
Client |
Result |
dccp-UC |
dccp -d 63 -H /pnfs/uchicago.edu/mwt2testdisk/iut2-s1_1 /tmp/aaa |
iut2-s1 |
uct2-c074 |
hangs |
dcap-UC |
|
|
|
dcap-IU |
|
|
|
xrd |
|
|
|
xrd-UC |
|
|
|
xrd-IU |
|
|
|
dcache-xrd |
|
|
|
dcache-xrd-UC |
|
|
|
dcache-xrd-IU |
|
|
|
2011-09-20
changed tcp-timeout to 6000
reran the test for 150 clients x 100 transfers each, the plots are attached with 9-19 showing the first tests from 9-19 which had 5 stuck transfers a the end, and the new test results on 9-20 which only seemed to have one stuck transfer at the very start. Java ran at about 360% CPU during this time, iut2-s1 had a load of ~6.
- iut2-s1 150 uc NAT client test 100 transfers each:
- 9-19-2011 150 nat'd client test trunk errors:
- 9-19-2011 150 nat'd client test trunk throughput:
- 9192011_iut2-s1-traffic-150natclient100dcap.jpg:
2011-09-19
attached current config,
considering increasing these values to stop tcp timeouts on dcap connections from natted hosts to iut2-s*
ip nat translation tcp-timeout 600
ip nat translation udp-timeout 600
testing iut2-s1 using modified blessing.sh running runreadtest 1024MB 100 $tot_client $method to test dcap reads
/home/nathany/blessing/blessing.sh 5
shows 20 slow movers in
http://uct2-dc4.uchicago.edu:2288/queueInfo for iut2-s1_1
example transfer from
http://uct2-dc4.uchicago.edu:2288/context/transfers.html
DCap-uct2-s1-Unknown-435706 dcap-uct2-s1Domain 1 dcap-3 nathany 1911 000078ECEBBE09544FFE965D4470C943FDCC iut2-s1_1 uct2-c009.mwt2.org
WaitingForDoorTransferOk 00:00:23 A 453375 19212
2011-03-29 Updates
- 2011-03-29 - Uploaded a new 6509 configuration, which contains the following changes
- Removed flow-control from upstream link to campus, as they do not support flow control
interface Te1/1
no flowcontrol receive on
no flowcontrol send on
-
- Altered incorrectly set ip helper address for vlan 101 from broadcast address to IP of uct2-mgt
interface Vlan101
no ip helper-address 10.1.255.255
ip helper-address 10.1.2.232
-
- Removed helper address from vlan 624
interface Vlan624
no ip helper-address 10.1.2.232
ip classless
-
- Removed vlan access info for trunked ports (Te1/2, Te2/2)
no switchport access vlan 624
-
- Removed vlan trunk info for access ports (Ge3/2 - Ge3/34)
no switchport trunk encapsulation dot1q
no switchport trunk allowed vlan 101,624
- 2011-03-16 - Uploaded a new 6509 configuration, which contains the following changes:
no ip route 10.0.0.0 255.255.0.0 128.135.158.129
ip route 10.0.0.0 255.0.0.0 vlan 101
ip route 0.0.0.0 0.0.0.0 128.135.158.129
no ip default-network 128.135.0.0
no ip default-network 10.0.0.0
- 2011-03-10 - Uploaded a new 6509 configuration which has the following changes to remove netflows accounting and pim dense-mode:
1. Under interface vlan 101
no ip pim dense-mode
no ip route-cache flow
2. under interface Vlan624
no ip route-cache flow
Summary
We have been experiencing intermittent network connection failures within the MWT2 cluster. We have traced it to connections which initiate from within the internal IP range (10.1.x.x) whose destination is external (NAT'd connections).
We have 280 worker nodes and 25 head nodes all connected through the 6509. The workers are NAT'd behind a single IP address which is also the IP address of the 6509.
Symptoms
Our primary symptom for this failure is stuck jobs due to failed access to our d-cache system. The first hint we got that this was NAT-related was when one of our nodes failed to be configured properly with static routes to the storage servers, and was using NAT to access the d-cache system. The DCAP protocol is notoriously sensitive to network latencies, packet loss and broken connections and so it is often the first thing to fail when we have network problems. In this case, the worker node which was accessing d-cache through the NAT would fail reliably on a regular basis, whereas the other workers continued to operate fine. As soon as we added back in the static routes the problem went away. We saw this again when we added a storage server at the IU side of our cluster: now ALL the nodes at UC would regularly fail connecting to the IU storage node, whereas all the IU nodes (all on public IP addresses) did just fine accessing data from both sites.
There are a few ways outside of d-cache to see this failure occur. The simplest is with a loop of wget tests, like the following:
for i in {1..5000}; do echo "RUN $i"; wget http://iut2-s1.iu.edu >>/tmp/wget.output 2>&1; /bin/rm index.html; done;
In the working cases, this will quickly and reliably fetch 5000 copies of the very simple HTML served from iut2-s1.iu.edu. In the broken cases this will stick with one of the following output (in /tmp/wget.output), although sometimes not until over 1000 requests have gone through. In either case the transfer will retry and succeeds for wget.
Connecting to iut2-s1.iu.edu|149.165.225.225|:80... failed: Connection timed out.
Connecting to iut2-s1.iu.edu|149.165.225.225|:80... failed: No route to host.
When tcpdump'ing on both the web server and the client, we can clearly see that some packets do not make it to the host. The complete tcpdump output is attached to this page.
DUMP FROM uct2-grid3 (client):
10:36:25.255690 IP uct2-grid3.mwt2.org.37766 > iut2-s1.iu.edu.http: S 2249776629:2249776629(0) win 5840
10:36:25.261791 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37766: S 4262417301:4262417301(0) ack 2249776630 win 5840
10:36:25.261806 IP uct2-grid3.mwt2.org.37766 > iut2-s1.iu.edu.http: . ack 1 win 12
10:36:25.261844 IP uct2-grid3.mwt2.org.37766 > iut2-s1.iu.edu.http: P 1:122(121) ack 1 win 12
10:36:25.267404 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37766: . ack 122 win 12
10:36:25.267473 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37766: P 1:289(288) ack 122 win 12
10:36:25.267481 IP uct2-grid3.mwt2.org.37766 > iut2-s1.iu.edu.http: . ack 289 win 14
10:36:25.267642 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37766: F 289:289(0) ack 122 win 12
10:36:25.268080 IP uct2-grid3.mwt2.org.37766 > iut2-s1.iu.edu.http: F 122:122(0) ack 290 win 14
10:36:25.273404 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37766: . ack 123 win 12
10:36:25.274416 IP uct2-grid3.mwt2.org.37767 > iut2-s1.iu.edu.http: S 2247415541:2247415541(0) win 5840
10:36:25.280127 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37767: S 4264156713:4264156713(0) ack 2247415542 win 5840
10:36:25.280136 IP uct2-grid3.mwt2.org.37767 > iut2-s1.iu.edu.http: . ack 1 win 12
10:36:25.280173 IP uct2-grid3.mwt2.org.37767 > iut2-s1.iu.edu.http: P 1:122(121) ack 1 win 12
10:36:25.285748 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37767: . ack 122 win 12
10:36:25.285838 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37767: P 1:289(288) ack 122 win 12
10:36:25.285845 IP uct2-grid3.mwt2.org.37767 > iut2-s1.iu.edu.http: . ack 289 win 14
10:36:25.285984 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37767: F 289:289(0) ack 122 win 12
10:36:25.286453 IP uct2-grid3.mwt2.org.37767 > iut2-s1.iu.edu.http: F 122:122(0) ack 290 win 14
10:36:25.291802 IP iut2-s1.iu.edu.http > uct2-grid3.mwt2.org.37767: . ack 123 win 12
10:36:25.292893 IP uct2-grid3.mwt2.org.37768 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:28.292497 IP uct2-grid3.mwt2.org.37768 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:34.293362 IP uct2-grid3.mwt2.org.37768 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:46.294588 IP uct2-grid3.mwt2.org.37768 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:37:10.298001 IP uct2-grid3.mwt2.org.37768 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:37:58.302959 IP uct2-grid3.mwt2.org.37768 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
DUMP FROM iut2-s1 (server):
10:36:25.258722 IP uct2-6509.uchicago.edu.5551 > iut2-s1.iu.edu.http: S 2249776629:2249776629(0) win 5840
10:36:25.258729 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5551: S 4262417301:4262417301(0) ack 2249776630 win 5840
10:36:25.264650 IP uct2-6509.uchicago.edu.5551 > iut2-s1.iu.edu.http: . ack 1 win 12
10:36:25.264726 IP uct2-6509.uchicago.edu.5551 > iut2-s1.iu.edu.http: P 1:122(121) ack 1 win 12
10:36:25.264733 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5551: . ack 122 win 12
10:36:25.264826 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5551: P 1:289(288) ack 122 win 12
10:36:25.264844 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5551: F 289:289(0) ack 122 win 12
10:36:25.269954 IP uct2-6509.uchicago.edu.5551 > iut2-s1.iu.edu.http: . ack 289 win 14
10:36:25.270724 IP uct2-6509.uchicago.edu.5551 > iut2-s1.iu.edu.http: F 122:122(0) ack 290 win 14
10:36:25.270729 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5551: . ack 123 win 12
10:36:25.277078 IP uct2-6509.uchicago.edu.5553 > iut2-s1.iu.edu.http: S 2247415541:2247415541(0) win 5840
10:36:25.277085 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5553: S 4264156713:4264156713(0) ack 2247415542 win 5840
10:36:25.282995 IP uct2-6509.uchicago.edu.5553 > iut2-s1.iu.edu.http: . ack 1 win 12
10:36:25.283074 IP uct2-6509.uchicago.edu.5553 > iut2-s1.iu.edu.http: P 1:122(121) ack 1 win 12
10:36:25.283081 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5553: . ack 122 win 12
10:36:25.283158 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5553: P 1:289(288) ack 122 win 12
10:36:25.283180 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5553: F 289:289(0) ack 122 win 12
10:36:25.288362 IP uct2-6509.uchicago.edu.5553 > iut2-s1.iu.edu.http: . ack 289 win 14
10:36:25.289090 IP uct2-6509.uchicago.edu.5553 > iut2-s1.iu.edu.http: F 122:122(0) ack 290 win 14
10:36:25.289096 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5553: . ack 123 win 12
10:36:25.295558 IP uct2-6509.uchicago.edu.5554 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:25.295565 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 4273033759:4273033759(0) ack 2253776509 win 5840
10:36:28.295162 IP uct2-6509.uchicago.edu.5554 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:28.295170 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 4276033364:4276033364(0) ack 2253776509 win 5840
10:36:34.296239 IP uct2-6509.uchicago.edu.5554 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:34.296247 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 4282034441:4282034441(0) ack 2253776509 win 5840
10:36:46.297511 IP uct2-6509.uchicago.edu.5554 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:36:46.297523 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 4294035715:4294035715(0) ack 2253776509 win 5840
10:37:10.301003 IP uct2-6509.uchicago.edu.5554 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:37:10.301024 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 23071916:23071916(0) ack 2253776509 win 5840
10:37:14.101871 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 23071916:23071916(0) ack 2253776509 win 5840
10:37:58.305716 IP uct2-6509.uchicago.edu.5554 > iut2-s1.iu.edu.http: S 2253776508:2253776508(0) win 5840
10:37:58.305728 IP iut2-s1.iu.edu.http > uct2-6509.uchicago.edu.5554: S 71076625:71076625(0) ack 2253776509 win 5840
As part of this investigation, I set up one our recently-deprecated head nodes up as a NAT server using ipchains and used it as the default gateway for a few test nodes. I was able to reliably see the failure when routing through the 6509, and reliably succeed 100% of the time using my test NAT server. Clearly this is not a complete test, since a single linux NAT server with 1-2 clients behind it will not behave the same as the 6509 with hundreds of nodes and thousands of clients behind it, but it does show a machine behind a different NAT but otherwise on the same physical network working correctly.
So far ping/ICMP tests and both TCP and UDP iperf tests have not demonstrated the problem.
Configurations
Our NAT is perfomed by the Cisco 6509, mapping its outside IP to the inside IPs listed in access list 1.
The NAT-specific config is as follows:
ip nat translation tcp-timeout 600
ip nat translation udp-timeout 600
no ip nat service skinny tcp port 2000
no ip nat service H225
ip nat pool uct2t3 128.135.158.241 128.135.158.241 netmask 255.255.255.128
ip nat inside source list 1 pool uct2t3 overload
no ip classless
ip default-network 128.135.0.0
ip default-network 10.0.0.0
no ip forward-protocol udp tftp
no ip forward-protocol udp domain
no ip forward-protocol udp time
no ip forward-protocol udp netbios-ns
no ip forward-protocol udp netbios-dgm
no ip forward-protocol udp tacacs
ip route 10.0.0.0 255.255.0.0 128.135.158.129
access-list 1 permit 10.1.2.0 0.0.0.255
access-list 1 permit 10.1.3.0 0.0.0.255
access-list 1 permit 10.1.4.0 0.0.0.255
access-list 1 permit 10.1.5.0 0.0.0.255
access-list 1 permit 10.1.6.0 0.0.0.255
access-list 1 permit 10.1.7.0 0.0.0.255
--
AaronVanMeerten - 08 Mar 2011