I mentioned before some of the tcp problems I’ve had with my A10 loadbalancer. Solved the client->a10 aspect of that connection and thought everything was fine. Turns out nope, the a10->server part does the same basic thing. The connection stays open on the server even though the a10 timed it out and has moved on. For webservers you don’t care cuz its just one of a couple hundred apache children counting toward its own timeout, but for mysql it means if you had a long running query it’ll keep running chewing up resources that eventually add up to annoyances if not full blown problems. So now I’m back to “my tcp proxy is fucking up my universe” support-ticket/bug-report land.
Speaking of which, I wound up running into another bug in the NAT code that just starting hanging random one-out-of-some-hundred connections until the box got rebooted. Turns out it was a known bug with a code fix already out, but it took quite the support adventure to get to that knowledge.
You always want to throw a temper tantrum and blame the vendor for this stuff, but since I’m being kinda ghetto and using a loadbalancer as a firewall I have to acknowledge I brought some of this on myself. Plus, even if it were “just” a firewall, I’ve been in a dozen pix/screen scenarios where tcp session/timeout/segment-size ruined my week. Loadbalancer/firewall adventures remind me of a description I once heard about knife fights: even when you win you get cut.