On Wednesday, February 12th 06:17 ET, Site Service began experiencing heavy CPU usage in one region, leading to degraded performance at the API level and in dependent products: Composer, WebSked. Capacity was added and service was restored.
The issue was due to an incorrect cache configuration in a release that day, which resulted in increased load across the systems. After correcting the configurations performance restored.
All times ET + 24 hour clock
Time | Event |
---|
Time | Event |
---|---|
06:15 | Latency increases slightly |
06:17 | Latency increases, service becomes degraded |
06:42 | Capacity is increased |
06:47 | New instances come online and latency starts decreasing |
06:52 | Service is fully restored |
The capacity for this region has been scaled up, and additional monitoring will be added to the service to better identify the source of the CPU increase. Furthermore, additional checks will be implemented around the PageBuild Engine integration systems to monitor traffic.