Prolonged unscheduled maintenance

topic posted Sat, May 30, 2009 - 12:01 AM by  carolyn
Share/Save/Bookmark
Advertisement
Thursday night's scheduled maintenance unfortunately became all day Friday's unscheduled maintenance. As soon as I had decided to cancel the scheduled maintenance due to the failure of a DNS server and thus the load balancer configuration, the master database suddenly decided to turn itself off again. When that happens, similar to last week, it takes several hours to just get the hardware running again, and then another 12 hours or so to repair the database damage.

We realize the last several outages have been unusually long and appear to be more frequent, thus we are actively working on:
1. Building a new database server
2. Outfitting the data center with more remote access tools (to reduce the need to travel there to fix problems)
3. Reworking the database repair procedures to reduce the downtime when catastrophes happen

All in addition to continuing to rework the architecture and codebase and overall improve the site.

Again, thank you for your patience, we know how difficult this has been for everyone.
posted by:
carolyn
SF Bay Area
Advertisement
Advertisement

Recent topics in "Tribe.net Company Blog"