Degraded generation quality on public voices

Incident Report for Cartesia AI

Postmortem

Overview

During a maintenance upgrade to Cartesia’s voice metadata, existing metadata for a narrow subset of default Cartesia voices was unintentionally overwritten, causing voice changes.
Some voices were affected more than others. A narrow subsection of voices saw increased hallucinations while 3 voices were significantly impacted.
Degradation occurred gradually as caches began expiring and was mitigated after a database restore and global cache purge.
In addition to fixing the root cause, we are updating our change management and monitoring process to prevent issues like this in the future by enforcing data upgrade safety via tooling, improving automatic detection, updating our triage process, and investing in a lower recovery time objective (RTO) for critical data like voices.

Detailed Analysis

Timeline (UTC)

2026-02-25 04:57 – Upgrade completes, overwritten voice metadata starts being served as caches expire. We regularly test our model for regression, but since only a small subsection of voices were affected, they were not caught by our automated testing in this case.
2026-02-25 23:00 - 2026-02-26 17:00 – We receive reports of voice changes and begin internal investigation to assess scope and severity. The on-call investigated but made an incorrect determination of the source and severity of the issue which delayed remediation.
2026-02-26 18:40 – Impact triaged to high severity.
2026-02-26 19:27 – Root cause identified; mitigation plan begins.
2026-02-26 22:13 – Data fully restored; global cache purge begins.
2026-02-26 23:06 – Global cache purge completes. Customers begin confirming resolution.
2026-02-26 23:20 – Status page marked resolved

Root cause

A bug in the code executing a maintenance upgrade to Cartesia’s voice metadata caused the metadata for some existing voices to be regenerated and overwritten. This metadata is fundamental to how we represent voices and changes to this metadata can lead to changes in voice output.

The code path containing the bug was executed because some default voices did not fulfill an invariant of the metadata upgrade process and were incorrectly identified as requiring an upgrade. The issue was not caught in manual or automated testing, because because it only reproduces in a specific state that exists in production and affects a narrow subsection of voices.

Learnings and Next Steps

We sincerely apologize for this incident and the disruption it caused to your business. We understand that reliable voice quality is fundamental to your trust in Cartesia.

We are making the following corrections to our change management process and monitoring process to prevent issues like this in the future, targeting every step of the release and error recovery lifecycle:

Safer processes for routine data changes, enforced via tooling: We are investing in automated tooling to make our existing change management process even more thorough. We will incorporate additional automated tooling to triage the risk of changes, do dry runs on realistic data, and ensure multiple sign-offs beyond code review with automatic rollbacks in case of errors.
Improved automatic detection: We are expanding our automatic voice regression testing to expand coverage to much larger set of voices and transcripts.
Update triage process: The issue was incorrectly triaged. We will update our triage playbook to ensure that any reported voice issues are comprehensively checked against recent code and data changes to speed up an RCA and determine scope. This will ensure future issues are correctly escalated faster.
Invest in lower RTO: Our RTO for our voices infra is currently at 4 hours. However, for critical data like voices, we will invest in bringing down our RTO to 15 minutes. We will implement automation around finer grained data snapshots and data restoration that are currently manual or slow.

If you have any questions or concerns about this incident, please don't hesitate to reach out to our support team.

Posted Feb 28, 2026 - 02:20 UTC

Resolved

The incident has been resolved

Posted Feb 26, 2026 - 23:20 UTC

Update

We confirmed that the fix had been rolled out and TTS generation quality should resume to normal. We are continuing to monitor the situation to ensure that the incident had been resolved.

Posted Feb 26, 2026 - 23:11 UTC

Monitoring

We have rolled out a fix globally that should slowly take place over the next 10-15 minutes.

Posted Feb 26, 2026 - 22:44 UTC

Update

We are continuing to work on a fix for this issue. A resolution is expected in 1 hour.

Posted Feb 26, 2026 - 22:14 UTC

Update

We are continuing to work on a fix for this issue.

Posted Feb 26, 2026 - 21:08 UTC

Identified

We have identified a degradation in speech generation quality for requests using public voices since 6:30 PM PST on February 25.

Posted Feb 26, 2026 - 20:45 UTC

This incident affected: Text to Speech (TTS) (Text to Speech (US), Text to Speech (EU), Text to Speech (APAC)).