연구 분야: Artificial Intelligence
학회: Neural Computing and Applications
End-to-end speech translation (ST) has attracted substantial attention due to its less error accumulation and lower latency. Based on triplet ST data speech-transcription-translation , multitask learning (MTL) that utilizes machine translation transcription-translation or automatic speech recognition speech-transcription task to assist in training ST model is widely employed. However, current MTL methods often suffer from subnet role mismatch, semantic inconsistency, or usually focus only on transferring knowledge from automatic speech recognition (ASR) or machine translation (MT) task, leading to insufficient transferring of cross-task knowledge. To solve these problems, we propose the multitask co-training network (MCTN) to jointly model ST, MT, and ASR tasks. Specifically, the ASR task enables the acoustic encoder to better capture local information of speech frames, and the MT task enhances the translation capability of the model. MCTN benefits from three key aspects: a well-designed multitask framework to fully exploit the association between tasks, a model decoupling and parameter sharing method to maintain consistency in subnet roles, and a co-training strategy to utilize task information in triplet ST data. Our experiments show that MCTN achieves state-of-the-art results, when using only MuST-C dataset, and significantly outperforms strong end-to-end ST baselines and cascaded systems when external data are available.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 2 |
| 출판 국가 | Andorra, China |
| 사이트 | Springer |
| 좋아요 수 | 0 |