Title: Improving Automatic Speech Recognition with Federated Learning
In this blog, the authors explore the use of Federated Learning (FL) to train End-to-End Automatic speech recognition (ASR) models. They examine different factors that can help minimize the performance gap between models trained using FL versus their centralized counterpart.
Effects of different factors:
– Adaptive optimizers
– Loss characteristics via altering Connectionist Temporal Classification (CTC) weight
– Model initialization through seed start
– Carrying over modeling setup from experiences in centralized training to FL
– FL-specific hyperparameters
The authors shed light on how some optimizers work better than others by inducing smoothness. They also summarize the applicability of algorithms, trends, and propose best practices from prior works in FL toward End-to-End ASR models.
The figure in the article shows the overlap among central model updates for Yogi and Adam optimizers for the first 50 aggregation rounds. The wider diagonal white beam for Yogi represents the additional smoothening achieved by Yogi, minimizing the effect of heterogeneity among client updates.
In conclusion, this research explores how FL can improve ASR models and provides valuable insights for future developments in the field.