The Significance and Challenges of Code-Switching in Natural Language Processing (NLP)
A recent paper presented at the EMNLP Workshop on Computational Approaches to Linguistic Code-Switching (CALCS) explored the phenomenon of code-switching (CS) in communication. CS refers to the mixing of different languages within a single sentence, which poses challenges in various NLP scenarios. The study focused on two unexplored areas in real-world CS speech translation: streaming settings and translation to a third language.
Streaming Settings and the Need for Translation to a Third Language
Prior research in CS speech translation has shown promising results for end-to-end offline scenarios and translation limited to one of the source languages (monolingual transcription). However, the study recognizes the need to extend these capabilities to streaming settings and translation to a language not present in the source.
Extending the Fisher and Miami Datasets for Spanish and German Targets
To address these gaps, the researchers enlarged the Fisher and Miami test and validation datasets. They included additional targets in Spanish and German, enabling training of a model for both offline and streaming speech translation. This expansion allowed for the establishment of baseline results in the two aforementioned settings.