연구 분야: Databases
학회: The Journal of Supercomputing
In multimodal sentiment analysis tasks, it is very challenging to model the relationships between different modalities and fuse them. The problem in this area is the unbalance of sentiment representation and distribution across the different modalities, resulting in a fusion process that deviates from the multimodal sentiment-semantic space. We propose a novel fusion framework, MECG, based on graph convolutional neural networks, which provides an efficient approach for fusing unaligned multimodal sequences. With the help of text modalities, we first use the multimodal enhancement module to enhance visual and acoustic modalities to obtain more discriminative modalities, thus assisting the subsequent aggregation process. In addition, we construct text-driven multimodal feature graphs for modality fusion, which can effectively deal with the unbalanced issue among modalities in the graph convolution aggregation process. Finally, we integrate the fused information extracted by MECG into the verbal representation, thus dynamically transforming the original word representations toward the most accurate multimodal sentiment-semantic space. Our model proves its effectiveness and superiority on two publicly available datasets: CMU-MOSI and CMU-MOSEI.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Anguilla, China |
| 사이트 | Springer |
| 좋아요 수 | 0 |