Unplanned Downtime Earlier Today (Thu Feb 13 14:59-15:21 UTC 2014)

February 13th, 2014

The primary database server experienced a crash due to a full disk earlier today around 6:59am PST (14:59 UTC). The Nagios monitoring system which normally alerts us to prevent these outages had also crashed and failed to send any notices of a problem to the infrastructure team. By 7:21am PST (15:21 UTC) Jeff Sheltren had cleared enough space to bring the database server back online and returned to normal operation.

We are sorry for any inconveniences the outage may have caused. We are taking steps to prevent failed monitoring in the future by adding additional monitoring of our monitoring server. Brandon Bergren is also working to fix the issue with the cache_form table which caused the disk to fill up.