Cross-Attention Reprogramming for ASR: Bridging Discrete Speech Units and Pretrained Language Models

AI Summary1 min read

TL;DR

This article introduces cross-attention reprogramming for ASR, a method that bridges discrete speech units and pretrained language models to enhance speech recognition by optimizing their integration without fine-tuning the embedding layer.

Pei-Jun Liao; Hung-Yi Lee; Hsin-Min Wang
https://doi.org/10.1109/ACCESS.2025.3649090
Volume 14

In automatic speech recognition (ASR), an emerging trend involves converting continuous speech features into sequences of discrete speech units (DSUs) via quantization. A key advantage of DSU representations is their compatibility with pretrained language models (PLMs), where DSUs are directly mapped to PLM token indices and the embedding layer is fine-tuned. However, this conventional strategy of...