**Improving Podcast Accessibility with Human Evaluation Word Error Rate (HEWER)**

Podcasting is a popular form of entertainment and information sharing that many people enjoy. However, without transcripts, podcasts can be difficult for individuals who are hard-of-hearing, deaf, or deaf-blind to access. Creating accurate and readable auto-generated transcripts is key to making podcasts accessible to everyone.

**Challenges with Automatic Transcripts**

The Apple Podcasts catalog contains millions of podcast episodes, which are transcribed using automatic speech recognition (ASR) models. However, measuring the accuracy and readability of these transcripts can be challenging. The traditional method of using word error rate (WER) to evaluate transcripts does not account for readability issues such as misspelled proper nouns or capitalization errors.

**Introducing HEWER Metric**

To address this gap, researchers at Apple developed the human evaluation word error rate (HEWER) metric. HEWER focuses on major errors that impact readability, such as misspelled proper nouns and certain punctuation errors, while ignoring minor errors like filler words or alternate spellings.

**Applying HEWER to Podcast Transcripts**

In a study of 800 segments from 61 podcast episodes, the HEWER score was found to be just 1.4%, indicating that the ASR transcripts were of higher quality and more readable than traditional WER measurements might suggest. By focusing on major errors that affect readability, HEWER provides a more nuanced assessment of transcript quality.


By introducing the HEWER metric, Apple aims to improve the accessibility of podcasts for all users. This innovative approach to evaluating transcript quality provides valuable insights for both users and content creators. With HEWER, Apple is setting a new standard for measuring the readability of auto-generated transcripts in the podcasting industry.

