Dear LazyWeb, I could use some help. I feel like I’m asking the wrong questions, and need some nudges in the right direction.
noodl has been very helpful, I think, but his suggestions appear to be answering the question I’m asking, rather than the one I mean. I think.
We have Mongrel + Rails + Apache + proxy_balancer on Server Q out in the DMZ. Mysql is running on Server Z, inside. There’s a firewall rule to allow Q to talk to Z on the mysql port. So far so good.
After N minutes, the firewall times out idle connections between Q and Z. N is configurable, of course, but that doesn’t fix anything, because people go home over the weekend, and N will eventually be reached. So increasing N postpones the problem, but doesn’t fix it.
The problem is thus. After the N minute timeout is reached, the connections to mysql drop. Subsequent requests to the Rails application do not result in the database connection being reestablished, as expected. (At least, it’s what I expected, but perhaps I need to adjust my expectations.) Requests to Rails after this point result in an extended wait period, followed eventually by a proxy timeout. The only way to reestablish the mysql connections (that we’ve found) is to restart the mongrel cluster.
Functions put in a before_filter to reconnect do not seem to be getting called. Indeed, the before_filter doesn’t even seem to be reached. It’s as though the hangup is happening in some stage before the before_filter – Rails is trying to contact the database, and is waiting indefinitely for a response.
Placing a reconnect in the before_filter works, and reconnects, as long as the mysql connection is up. (Not useful, but interesting.) However, after N minutes are allowed to expire, and the connections drop, that code does not appear to be getting invoked at all.
So it’s entirely possible that I’m asking the wrong questions, but my hope is that one of my knowledgeable readers will see this post and immediately say, Oh, sure, that’s the well-known problem that is solved like *this*, and *here* is the question you really should have been asking.