From 2024-01-29 15:00 UTC to 2024-02-01 06:00 UTC, we experienced an issue with our IAM and WAC API operations resulting in the possibility of a slow response time to client requests. The root cause of these slow API responses was caused by a high number of duplicate requests to our system which required multiple services to communicate and process these requests in the order in which they were received. This high rate of duplicate requests caused a backlog in processing in our billing subsystem which was unable to respond to the requests at the speed in which they were being sent to our system. Due to this bottleneck in the billing subsystem, all requests to IAM and WAC API were delayed until they could be processed in the order in which they were received.
While the root cause of this issue began at 15:00 UTC on 2024-01-29, our system was able to keep up with the request rate until approximately 12:30 UTC on 2024-01-31 when we were notified of an increasing delay in IAM and WAC API requests. At 16:00 UTC on 2024-01-31, our team was able to identify the source of the requests and block the source of duplicate requests. At 17:00 UTC 2024-01-31, our Operations and Engineering Teams began the recovery process to complete all requests in the queue and streamline the acceptance of new requests to our systems. By 06:00 UTC 2024-02-01, the recovery process was completed, and all systems were fully operational allowing normal response times to our IAM and WAC APIs.