From 2024-04-03 11:55 UTC to 2024-04-03 12:25 UTC, we experienced a global outage with all Wasabi services, including S3, IAM, STS, WACM, WAC API, and Console, with our us-west-1 region having an extended outage lasting until 13:06 UTC. At 11:55 UTC our Operations Team was notified by our alerting system that our global database was beginning to experience memory-related issues outside of its normal operating range. Ten minutes later at 12:05 UTC the database crashed, causing all services and APIs to fail to accept any incoming requests from clients, resulting in a global outage of all services.
At 2024-04-03 12:05 UTC, our Operations Team began the manual process of rebooting this database server instance to restore database operation. Once rebooted and all safety checks were completed, service was restored at 12:25 UTC to 12 out of 13 regions, with our us-west-1 region being the outlier. The regional servers in our us-west-1 region had difficulty with restoring the connection to our global database, which caused our Operations Team to take action by manually restarting these servers to restore connection. By 13:06 UTC, services were restored in our us-west-1 region.