Archive for July, 2009

you fuck with tcp, you get fucked

Monday, July 20th, 2009

I mentioned before some of the tcp problems I’ve had with my A10 loadbalancer. Solved the client->a10 aspect of that connection and thought everything was fine. Turns out nope, the a10->server part does the same basic thing. The connection stays open on the server even though the a10 timed it out and has moved on. For webservers you don’t care cuz its just one of a couple hundred apache children counting toward its own timeout, but for mysql it means if you had a long running query it’ll keep running chewing up resources that eventually add up to annoyances if not full blown problems. So now I’m back to “my tcp proxy is fucking up my universe” support-ticket/bug-report land.

Speaking of which, I wound up running into another bug in the NAT code that just starting hanging random one-out-of-some-hundred connections until the box got rebooted. Turns out it was a known bug with a code fix already out, but it took quite the support adventure to get to that knowledge.

You always want to throw a temper tantrum and blame the vendor for this stuff, but since I’m being kinda ghetto and using a loadbalancer as a firewall I have to acknowledge I brought some of this on myself. Plus, even if it were “just” a firewall, I’ve been in a dozen pix/screen scenarios where tcp session/timeout/segment-size ruined my week. Loadbalancer/firewall adventures remind me of a description I once heard about knife fights: even when you win you get cut.

presale: 2 – 3 weeks for delivery. postsale: 4 – 5 weeks for delivery

Thursday, July 9th, 2009

Dell has hit me with this three times in the last two quarters. Combined its added over a month to my project timeline, a month I did not have to spare.

I know I shouldn’t get my hopes up, the whole point of using dell is cheap solid hardware with parts service, but I really think next time I’m in a datacenter-building/server-vendor-shopping place I’m gonna take a much harder look around. The sales engineers were of marginal help, I wound up having to get all my answers from the techcenter wiki and a friend.

CentOS: How to scan the SCSI bus with a 2.6 kernel

Wednesday, July 8th, 2009

CentOS: How to scan the SCSI bus with a 2.6 kernel.

Looks like Tim stopped updating awhile ago, but this is a handy post I wanted to have a link to here.

A10 AX2000 goofy tcp timeout behavior

Monday, July 6th, 2009

Ran into an obscure tcp bug/behavior the last few days on my AX2000 loadbalancer. By default the layer4 tcp loadbalancing mechanism decouples the tcp connections such that you have two separate ones: (client -> a10) + (a10 -> server). The backend (a10 -> server) has a timeout default of 120 seconds, but for some reason the session table only updates every minute so it is in effect a 180 second timeout (it will still work and show up in a “show session” command with an Age of 0 for the last minute).

client -> a10 has no such timeout (or its much longer), so if you have something with a long idle connection (like say, mysql_pconnect on a development webserver) you can easily run into a case where the front end tcp socket considers itself in an established state while the backend one has timed out and disappeared. The next packet through that connection will need a new backend tcp connection to be established. Only the A10 doesn’t do this. It just blackholes the traffic. No TCP RST, no automatically firing up a new connection, just … fail.

As a workaround you can apply a custom tcp template that will properly return a TCP RST. Search your cli reference PDF for “reset-rev” for info. In my mind this is somewhat contradictory to the whole notion of connection-pooling/connection-reuse, but I’m not sure if mysql would even allow the new socket to just reappear since it probably already cleaned up the session.

Your call is important to us.

Wednesday, July 1st, 2009

But not important enough to answer without making you sit on hold for 23 (and counting) minutes.

Chalk up some fail points for Entrust customer support.

Hahaha wait it gets better at 25 min they just dump you to voicemail. Now the clock is on to see how long till I get a callback.

edit: It gets better! An hour later still no callback, so I call them. I get told the office is now closed, their biz hours are 6am to 8pm. Now, its 6:15 by me (EDT)… so it seems clear to me that Entrust has found the secret island of Atlantis and is headquartered at GMT-3.

edit2: turns out they’re canadian and thursday was a holiday where they left early, voicemail didn’t reflect