Hi all. I know the chip guide errors have been problematic for the past several days and I wanted to let you know what's going on. I didn't run this by the board or Charles, so I hope I'm not overstepping my authority by publishing this update.
The basic problem is that the current server is overloaded. While certain aspects have always been slower than others (trademaker is a good example, as are certain query facility searches), the current situation is that someone or something is hammering the system with requests. We know which IP address it's coming from, but the way the current server is set up (no root admin access), I/we can't get in to block traffic to/from it. There's a built-in "banned IP address" check function in the chipguide script being called, but I added the offending IP address and it didn't slow it down one bit. So, whatever "it" is, it's bypassing all of the normal security checks that were put in place.
The 500 errors you're all getting is because the server load is so high (>30, versus 1-2 on a normal system) that the web server stops serving new requests. Once the load average dies down, requests resume. The period of time it's "down" depends on the length of time the server is way overloaded, which is why the "downtime" appears to vary.
Adding to the problem is the fact that the hosting company support is horrible. Charles tried to get them to look at the problem or at least answer questions I needed answers to and they basically said they couldn't/wouldn't do anything and that it was an internal issue we'd have to figure out on our own (which we can't because we don't have the required access levels).
Charles is likely going to be upgrading the system to have more facilities (more and faster processors, more RAM) and more admin capabilities (root shell access, plesk system management). To do this, the site will likely need to be taken offline for a few hours. Timeline-wise, that is up to the hosting company to schedule but I'm sure Charles will try to have them do it during non-peak usage times. That said, given the stress the system is under right now, he may opt to just have them do it at their earliest convenience with no notice to the users. If that happens, he or I will try to post a message here and on the CCA forums letting you know it's underway.
I hope that at least answers some of the questions you may have had and assures you that we volunteers have been working all manner of hours to find a solution as quickly as we can between our day jobs and personal lives.
The site is backed up, so there are no concerns about loss of data. Submissions and updates should continue to be sent in as normal. The transition to the new server shouldn't affect anything other than some log files (who cares?) so it's just (slow) business as usual for now.
Thank you for your patience.
-- Barry Sherwood
|