Resolved -
This incident has been resolved.
Oct 23, 19:40 EDT
Update -
The backlog has been fully cleared. All past updates have been processed, and new updates are being handled live for both Composer and programmatic ingestion. Any remaining errors can be resolved by republishing affected content. Customers should now be able to work normally across all environments.
Oct 23, 12:56 EDT
Update -
The majority of content published in the past 12 hours has now been processed. New content may still experience delays while full recovery continues. We will confirm once all processing times have returned to normal levels.
Oct 23, 11:40 EDT
Update -
We've increased capacity, and content updates are now processing quickly. Full recovery may take a few more hours, we'll continue providing updates and confirm once systems are fully back to normal. Thank you for your patience as we work through this.
Oct 23, 10:45 EDT
Update -
As mentioned in our previous updates, we have largely increased capacity and the backlog of content updates is being processed rapidly. Due to the large volume of updates however, we expect full recovery to take some time still (a few hours, not days). We will continue to provide regular updates as well as an "all clear" once we are back to normal.
We appreciate your patience as we work through this recovery.
Oct 23, 09:27 EDT
Update -
As mentioned in our previous updates, we have largely increased capacity and the backlog of content updates is being processed rapidly. Due to the large volume of updates however, we expect full recovery to take some time still (a few hours, not days). We will continue to provide regular updates as well as an "all clear" once we are back to normal.
We appreciate your patience as we work through this recovery.
Oct 23, 09:13 EDT
Update -
Due to the large backlog of operations, some stories that were created, saved, or updated in the past couple hours may not yet be processed. This may result in stories not showing up on the web, or in the Composer search. In most cases, simply republishing those stories via Composer is enough to rectify them.
We are adding capacity and taking steps to accelerate the processing of the backlog to remediate this situation as fast as possible.
We will continue posting updates on a regular basis.
Oct 23, 07:27 EDT
Update -
We have resolved the secondary issue with the Kafka cluster, and the processing pipelines is now working as expected. Changes in Composer (and other updates of priority "standard") should reflect normally in Content API for all organizations. For programmatic updates (Arc-Priority: ingestion), there is a backlog that is currently being processed, and may take up to a couple hours to clear up. Some of this backlog is within the range of our normal processing times for customers with large ingestion volumes, but may be more elevated for some.
If some important stories are out of sync, a re-publish in Composer should bring them to an updated status.
We will continue providing regular updates until all metrics are back to normal levels.
Oct 23, 06:36 EDT
Update -
We have identified a secondary issue with our Kafka cluster and are working with our infrastructure provider to resolve it. This affects a subset of customers for whom stories are either delayed or not being processed. We will continue to provide updates as the situation progresses.
Oct 23, 05:10 EDT
Update -
Operations with Arc-Priority: ingestion are still being processed with a delay. As we continued our investigation however, we became aware of a disruption for messages from Composer (messages with Arc-Priority: standard) for a subset of customers. Changes in Composer may not reflect in Content API and thus the web. This is being actively investigated.
Oct 23, 04:13 EDT
Update -
Operations with Arc-Priority: ingestion are still being processed with a delay. As we continued our investigation however, we became aware of a disruption for messages with priority "standard" for a subset of customers. Changes in stories may not reflect in Content API and thus the web. This is being actively investigated.
Oct 23, 04:08 EDT
Update -
Many publishing features have returned to normal operation. We are continuing to monitor metrics as recovery continues.
Oct 23, 03:25 EDT
Update -
At this time, Composer and Websked applications are expected to be responsive to search & publishing actions. Any applications leveraging the Arc-Priority: standard are also expected to be arriving in a timely manner.
Customer Applications that are leveraging the Arc-Priority: ingestion are expected to be slightly delayed but are actively being processed.
Oct 23, 02:20 EDT
Update -
System monitoring points to solid recovery metrics and delays are trending downwards. Engineering teams continue to be engaged until the event is resolved.
Oct 23, 01:33 EDT
Update -
Content operations continue to clear through the backlog queue. Monitoring by the full response team is ongoing.
Oct 23, 00:54 EDT
Update -
We have observed an improvement in content operations clearing through the backlog queue. Our team is still engaged after scaling infrastructure components and we continue to monitor the situation.
Oct 23, 00:15 EDT
Update -
We continue to process the backlog of content operations and are working to restore the stability of publishing times.
Oct 22, 23:42 EDT
Update -
We continue to process the backlog of content operations and are seeing an improvement in article publish times. The team is scaling additional infrastructure components to further increase processing velocity and restore publish times to normal levels.
Oct 22, 23:09 EDT
Update -
We are continuing to monitor the progress of mitigation of delays across the region.
Oct 22, 22:19 EDT
Update -
Photo Center and Video Center are back to full capacity and normal operation.
We are continuing to monitor for any further issues.
Oct 22, 21:33 EDT
Update -
We are continuing to monitor for any further issues.
Oct 22, 20:52 EDT
Update -
Processing times have been observed to continue to improve as expected. We expect normal operational levels within in hour.
Oct 22, 20:20 EDT
Update -
The cluster reboot has resolved the connectivity issues, and Kafka queues are now processing messages normally. The team has scaled up message topics and supporting infrastructure to accelerate backlog processing. Customers should see continued improvement as the queues clear and content operations return to normal performance.
Oct 22, 20:00 EDT
Update -
We are investigating a potential root cause related to a recent AWS patch that may have affected the stability of our Kafka cluster. As part of mitigation efforts, we are rebooting the cluster to re-establish component connectivity.
Oct 22, 19:47 EDT
Update -
We are now observing an increase in delays affecting content operations, including publishing in the us-east-1 region. Engineering is investigating elevated latency in the content synchronization layer and is working to mitigate the impact. Further updates will be provided as we learn more.
Oct 22, 19:09 EDT
Update -
We are continuing to monitor the progress of mitigation of delays across the region. Some customers will still see delays but we expect this improve over time.
Oct 22, 18:17 EDT
Monitoring -
A fix has been applied and is propagating across the system. The content operations delay is now about five minutes and is continuing to decrease.
Oct 22, 17:37 EDT
Identified -
There is delay for our internal routing for content synchronization at around 4:30 PM ET which led to a 10 minute delay in Content Operations. This incident is limited to customers located in the us-east-1 region.
Oct 22, 17:17 EDT