Nurd Land: September 2010

Thursday, 30 September 2010

CCNP Switch Fail

I recently failed my CCNP SWITCH exam. Only by a few marks ( < 5%) but enough nonetheless. At first I was upset but now I've come to realize that I wasn't thinking the right way about a lot of the questions, particularly the simulations.

Previously, I had seen sims in Cisco exams which clearly stated what the end goal was in technical terms but this exam was more about giving a set of requirements and letting you figure out what was required. During the exam I wasn't thinking through the requirements at all, just my own set of technical goals, e.g. must get this ether channel up.

Having had time to think about, I now realize that this comes down to my lack of experience (despite doing Cisco networking for 8 years) of doing design work. I have never sat down at a planning meeting with a client to determine their requirements, not have I been involved in a peer review process in an engineering team.

My networking role is just me looking after a large network (> 800 virtual users) all by myself, with no other network engineers and no 'client'.

The end result is I know my network very well but I have never had to plan or design from scratch which showed up in the exam.

I want to be a good engineer but planning and design are hard to study for unless you've had the exposure.

Never fear, I will keep at it and I hope now I at least better understand what is required of these types of questions in the exam.

Monday, 13 September 2010

Enabling rapid-pvst

This post is about my attempts to enable rapid-pvst (802.1w) on one of my switch blocks at work. My previous attempt had resulted in lots of loopback errors disabling up-links on the access switches.

My only thought at the time was that perhaps I had way too many end-to-end VLANs that took too long to converge. Now, having read up on rapid-pvst , I now believe it was probably the fact that during the process of enabling rapid-pvst, the default pvst has to be switched off and for a small period you are running without any spanning tree. If you have enough traffic then the probability is high that you could get a loop during this interval. It could still be a combination of effects though and since my original failure, I have been agressive with switchport trunk allow vlans ... to restrict the number of end-to-end VLANs and have been rewarded with success.

I now have managed to get rapid-pvst working on one of the switch blocks to which I originally had problems with. This time though, knowing more I took a cautious approach.

Step 1, Enable loopback errdisable recovery so that if for some reason the original problem reoccurred I wouldn't have to get console access to my access switches or reboot them. This can be done with errdisable recovery cause loopback
Step 2, Turn on consoles message so I can see any errors that might occur. (terminal monitor)
Step 3, Enabled rapid-pvst and wait spanning-tree mode rapid-pvst
Step 4, Save your work once switch is stable.

I noticed on each access layer switch, the management interface went down and came back up. Once I had completed all the access layer switches, I then did the core switch (only one) and once finished I had a stable switchblock running rapid-pvst. Here is the output of show span vlan 188:

VLAN0188

  Spanning tree enabled protocol rstp

  Root ID    Priority    24764

             Address     001b.8f97.2180

             This bridge is the root

             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    24764  (priority 24576 sys-id-ext 188)

             Address     001b.8f97.2180

             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

             Aging Time 300

Interface           Role Sts Cost      Prio.Nbr Type

------------------- ---- --- --------- -------- --------------------------------

Gi0/1               Desg FWD 4         128.1    P2p

Gi0/2               Desg FWD 4         128.2    P2p

Gi0/3               Desg FWD 4         128.3    P2p

Gi0/4               Desg FWD 4         128.4    P2p

Gi0/5               Desg FWD 4         128.5    P2p

Gi0/6               Desg FWD 4         128.6    P2p

Gi0/7               Desg FWD 4         128.7    P2p

Gi0/8               Desg FWD 4         128.8    P2p

Gi0/9               Desg FWD 4         128.9    P2p

Gi0/10              Desg FWD 4         128.10   P2p

Gi0/11              Desg FWD 4         128.11   P2p

Gi0/12              Desg FWD 4         128.12   P2p

Gi0/17              Desg FWD 19        128.17   P2p Edge

Gi0/27              Desg FWD 4         128.27   P2p Peer(STP)

Gi0/28              Desg FWD 4         128.28   P2p Peer(STP)

This switch block is one of the ones that I had the loopback issue with so I am happy to have gotten it working here. I will now rinse and repeat and see if I can get it working everywhere!

Thursday, 9 September 2010

Are we there yet? Sick of waiting for IPv6?

Just a quick one today.

I've been using IPv6 since the days of the 6bone and have been waiting ever since for commercial service. Initially I was only interested in IPv6 access at home but now that I look after my own commercial offering, I really want IPv6 available at work.

My provider (who also happens to be my parent company) has allowed me to join their trial of IPv6 via a tunnel from my Internet routers. This along with a /48 allocation of address space as at least allowed me to do a proof of concept and get all my infrastructure properly configure.

The big question is, with IPv4 running out within the year , why oh why am I still waiting for my service provider to get on board and provide IPv6 service?

I've now been told December this year but only if I sign up to a new Internet connection. I've yet to be told if they will be charging extra for IPv6 which would be counter productive. Given that my provider is my parent company, I have no choice but to wait, but what a long wait it's been.

Wednesday, 8 September 2010

The trouble with TCAM

Let me start by saying that whilst I am not a CCIE, I understand that the commands presented here are supposed to be CCIE level when it comes to switches.

The background is that I look after a network which uses layer 3 switches for the core/distribution layer. These are mostly 3560s. When I first started implementing the 3560 switches, I read that they supported IPv6. Being an early adopter when it came to IPv6, I sought to enable IPv6.

It turns out that how you enable IPv6 on the 3560 family is by repartitioning the CAM/TCAM tables using the sdm prefer command. This command dictates how much space us used for various kinds of resources, such as layer 2 entries, L3 routes, multicast routes and the mix between them.

Now when I first enabled IPv6 on my 3560s I didn't really understand what a TCAM was or why it was critical to layer 3 operations so I ended up making a choice that for years has impacted the performance of the network.

The command I used at the time was:

# sdm prefer dual-ipv4-and-ipv6 vlan

I figured at the time that I had a few vlans and that would be the way to go. Here is the table showing the mix of resources you get when you choose this option:

The selected template optimizes the resources in

 the switch to support this level of features for

 8 routed interfaces and 1024 VLANs.

  number of unicast mac addresses:                  8K

  number of IPv4 IGMP groups + multicast routes:    1K

  number of IPv4 unicast routes:                    0

  number of IPv6 multicast groups:                  1K

  number of directly-connected IPv6 addresses:      0

  number of indirect IPv6 unicast routes:           0

  number of IPv4 policy based routing aces:         0

  number of IPv4/MAC qos aces:                      0.75K

  number of IPv4/MAC security aces:                 1K

  number of IPv6 policy based routing aces:         0

  number of IPv6 qos aces:                          0.5K

  number of IPv6 security aces:                     0.5K

Can you see something a bit strange here? This line is the issue:

number of IPv4 unicast routes:                    0

Since most of my network was still IPv4, this line allow no space in the TCAM for IPv4 unicast routes! That was most of my traffic. The net result was periodic spikes in CPU usage on the switches when significant traffic went through them. It wasn't until recently, when studying for my CCNP SWITCH exam that I realized that these switches actually do routing in hardware for most traffic as long as there is room in the TCAM.

So I had a configuration that specifically did not have any room in the TCAM so all IPv4 unicast routing on these switches was being done in software. Now the CPU in a 3560 isn't great but its probably sufficient for low level traffic and having a dedicated backup LAN meant that a lot of heavy traffic wasn't routed, yet periodically there was enough traffic to spike the CPU. The cpu would max out at over 80% which is enough to mean other services could suffer.

Before you start thinking that I was a bit negligent letting this issue carry on for 'years' let me state that I had tried to debug this according to the methods suggested by Cisco.

I started out with doing:

#show proc cpu | ex 0.00

This shows the cpu tables excluding anything that's not taking up any CPU. The output of this showed IP Input was the process taking up all the CPU. This is exactly what to expect if lots of traffic is getting punted to the CPU. The next step is to find out why. The command:

# show ip cef switching statistics

       Reason                          Drop       Punt  Punt2Host

RP LES TTL expired                        0          0          1

RP LES Features                           0       4881          0

RP LES Total                              0       4881          1

All    Total                              0       4881          1

This command shows what is causing the CPU punts to occur. TTL is obvious but features requires more detail:

# show ip cef switching statistics feature

IPv4 CEF input features:

       Feature                Drop    Consume       Punt  Punt2Host Gave route

       NAT Outside               0          0       4881          0          0

Total                            0          0       4881          0          0

IPv4 CEF output features:

       Feature                Drop    Consume       Punt  Punt2Host    New i/f

Total                            0          0          0          0          0

IPv4 CEF post-encap features:

       Feature                Drop    Consume       Punt  Punt2Host    New i/f

Total                            0          0          0          0          0

IPv4 CEF for us features:

       Feature                Drop    Consume       Punt  Punt2Host    New i/f

Total                            0          0          0          0          0

This command, in my case, showed huge amounts of NAT Outside Punts. At this point I was stumped. I searched repeatedly for anything that could trigger NAT and explain what was going on.

As you may have guessed by now, that output was a furfy with the problem had nothing to do with NAT.

From the above output of the sdm preferences, it is now obvious that my naive choice for the sdm prefences resulted in no space in the TCAM for IPv4 routes and thus all IPv4 routing was being done by the CPU using the IP Input process.

The solution? Simply change the sdm preferences to dual-ipv4-and-ipv6 default!
"desktop IPv4 and IPv6 default" template:

 The selected template optimizes the resources in

 the switch to support this level of features for

 8 routed interfaces and 1024 VLANs.

  number of unicast mac addresses:                  2K

  number of IPv4 IGMP groups + multicast routes:    1K

  number of IPv4 unicast routes:                    3K

    number of directly-connected IPv4 hosts:        2K

    number of indirect IPv4 routes:                 1K

  number of IPv6 multicast groups:                  1K

  number of directly-connected IPv6 addresses:      2K

  number of indirect IPv6 unicast routes:           1K

  number of IPv4 policy based routing aces:         0

  number of IPv4/MAC qos aces:                      0.75K

  number of IPv4/MAC security aces:                 1K

  number of IPv6 policy based routing aces:         0

  number of IPv6 qos aces:                          0.5K

  number of IPv6 security aces:                     0.5K

Now I have plenty of space for both IPv4 and IPv6 routes what I loose is policy based routing but hey, that's something I can live with. Since this change I haven't had a single CPU spike (> 2 days now).

I have also since learnt that you don't get taught about SDM preferences until you study routing and switching at the expert level.

That's what you get for being an early adopter!

Monday, 6 September 2010

IPv6 Caveat - Apache & XP

I encountered an interesting problem with my Intranet today.

Certain pages were just hanging halfway through loading. Sure that doesn't sound very exciting but upon inspection of the source code I discovered it was a SOAP call that was hanging.

Trying a manual connection via telnet replicated the hang with an interesting point. It was trying to connect via IPv6. Now I didn't recall manually entering an IPv6 AAAA record into my primary DNS but it so happens that the target machine for this SOAP call is an XP box running the Windows version of the Apache web server and yes, I did recently enabled IPv6 on that machine.

So, diagnosis (yes short post today) is that whilst XP (SP3) supports IPv6 and Apache on other operating systems supports IPv6 it would appear that the two together do not support IPv6. That coupled with Active Directory automatic DNS updates and you have my problem.

The solution was to create a different URL for the SOAP service which only had an IPv4 record and update all the SOAP calls and WSDL files that described the SOAP service in the first place.

So, lesson to be learned, IPv6 support is still lacking despite being around for 15 years. I guess if I'd updated this box to Vista or Windows 7 it might be different but I'm not sure. Before you ask, no I can't install linux, my security vendor doesn't support linux, at least not for this product.

Oh well. Not everything is ready for IPv6 yet.

Friday, 3 September 2010

Apples Ping beset with Spam

Ok, yet another social networking offering, this time by Apple. Apple appears to be doing a Google and trying to get in on everything. This one though seems to be specific to the music industry and our consumption of their product.

So I signed up, to check it out. Nothing that exciting yet but already I've noted spam. People setting up accounts simply to comment on anything and everything to tout their wares, the one I saw was funnily enough about how to get a free iPhone.

Time will tell what Apple chooses to do about this. I've yet to see a web interface for Ping which is telling. If you can only access Ping via iTunes then it will be much less accessable than other social services.

Who knows if the 'net public will want to sign up for yet another social service. I think the only people who will will be people Apple already has a relationship with, i.e. iPhone and iPod users who have an iTunes account.

I can't see the draw, can you?

Wednesday, 1 September 2010

Bring on the NBN

By now, I think, the general public are probably sick of hearing about the NBN despite it being such a 'hot' election issue. Election? When was that? It was so long ago, I've forgotten, but we can still hear about the NBN.

Being a tech-head, I've done my share of cringing at the reasons people give for why it's a bad idea. Most average people I've polled say its a good idea but too expensive. To that, I'd like to quote an expert:

"The ubiquitous use of high capacity across the entire population is intended to alter the way in which services are delivered, in which we define work and entertainment and the way in which a relatively small population in the south Pacific Ocean defines its place as a developed and hopefully highly competitive economy in a global context. These are indeed great expectations and the price tag is entirely commensurate with the level of euphoric optimism that is associated with this national project." - Geoff Houston Chief Scientist at APNIC

I happen to agree. The NBN is the same style of public work as the original copper network was back in its day. Perhaps the same debate was had then but I don't think it was an election issue. The public wasn't asked to choose a technology solution for the nations future network.

That's what it comes down to. Each side has offered a solution and the public had been asked via an election to choose which they want. Of course the public is in no position to choose based on technical merits so instead they go with all they know, price. But how on earth does the public weigh up the price of an NBN? Can't find them at Coles or Woolies or even at Dick Smith Powerhouse!

I would like to state a simple reason for the NBN, we need it for all the reasons we can't think of right now. It's not about faster porn, or downloading illegal movies, though many might use those as arguments. Think more about providing cable TV to all those that don't have a big black cable hanging on their telegraph poles. Think about making a national community TV station. Think about having the ability to watch your recorded shows from your set top box at your mates place (copyright permitting).

These are things you can only do if most people have the same service. I remember a few years back trying to do the web cam thing with my sister in the UK. We did it because it was nice to see a face but it almost always deteriorated into a frozen image because neither of us had the sort of bandwidth required. Well imagine being able to do something similar but with the quality you expect from TV, HD TV even, across the globe. That would make this planet we live on seem a whole lot smaller. It wouldn't be such a separation to live in another country from your nearest and dearest. But a broadband policy that merely seeks to add some more people onto our aging copper network is just not good enough.

Copper and even wireless have limits. The further you are from either, the slower it gets. The speeds 'suggested' for each are never obtainable, so whilst it might be said that everyone will get 12Mb, that's just a suggested figure and your millage may vary. Certainly my existing service is billed as ADSL2+ yet I only get 4Mb. There is very little chance they are going to build a new exchange close to my residence so I have nothing to look forward to. No Internet TV, no HD video calls.

So I for one, welcome our new NBN overlords...

Nurd Land