Composer / Websked service interruption.

Incident Report for Arc XP

Postmortem

Customer Impact

On Sunday January 13th, from 17:54 PM to 23:03 PM, some clients experienced degraded performance with Content API Search and some slowness loading content in Composer. Infrastructure was scaled up and performance restored.

Root Cause

On January 13th, a workload with poor performance characteristics overloaded a part of Arc XP infrastructure, causing slowness in loading content in Composer and degraded performance for search requests. Infrastructure was scaled up and additional rate-limits were put in place to speed up recovery, once the scaling concluded and performance was restored, limits were lifted, and the system returned to its nominal state.

To prevent further recurrence additional restrictions were put in place on the offending workload.

Timeline

All times ET + 24 hour clock

Time Event
16:30 Cluster write load starts to have a degraded performance
17:20 Automated alerts notify engineers about an instability in Content API. Team starts investigating the case
19:31 Cluster resources are upscaled
20:20 First customer opened a ticket about slowness in Composer
21:38 Rate-limits added
22:45 Infrastructure scaling complete
22:50 All metrics returned to safe levels
23:00 All customer traffic restored
23:03 System fully restored

Arc Next Steps

  • The offending workload will be isolated and rate-limits hardened to prevent it from overloading the system while we analyze it further to improve its performance profile.
  • Write performance automated alert thresholds will be lowered, enabling preventative measures to be executed earlier to avoid degraded system performance.
Posted Jan 21, 2025 - 15:53 EST

Resolved

This incident has been resolved.
Posted Jan 12, 2025 - 23:26 EST

Update

Service is restored. You can expect normal operation with no further interruption from this point.
Posted Jan 12, 2025 - 22:50 EST

Update

Services pending restoration.
Posted Jan 12, 2025 - 22:45 EST

Update

We are continuing to investigate and monitor the issue.
Posted Jan 12, 2025 - 22:40 EST

Update

We are continuing to investigate this issue.
Posted Jan 12, 2025 - 22:04 EST

Investigating

Composer / Websked service interruption.
Customer traffic in US-East-1 has been reduced for search function availability to mitigate the temporary loss of infrastructure capacity.
Posted Jan 12, 2025 - 21:56 EST
This incident affected: Creator Apps (Composer, WebSked, PageBuilder Editor) and Content Platform (Publishing Platform).