Evaluation and Benchmarking the Agent Marketing Dialogue Scenarios with Large Language Models


연구 분야: Artificial Intelligence



학회: 2025 7th International Conference on Natural Language Processing (ICNLP)


초록

In natural language processing, abilities like text comprehension, reasoning, and generation, which are usually possessed by humans, can often measure the intelligence of an artificial intelligence (AI) model. To a certain extent, measuring the model's ability to process text information can reflect the ability to use natural language. Moreover, there is a lack of exploration of the natural language level of Large Language Models (LLMs) in the actual engineering scenarios of agent marketing conversations. Therefore, we construct the dataset LURG-TEXT and its benchmark, which covers the task scenarios commonly used by natural language processing in marketing scenarios, and contains the basic indicators of the execution benchmark. Through LURG-TEXT, we conducted empirical research on some existing open-source pedestal LLMs and closedsource LLMs, focusing on the text generation and comprehension capabilities of these models in agent marketing dialogue systems. The results show that the current state-of-the-art LLMs perform well in text generation and comprehension, while the reasoning ability needs to be improved.


Author Profile
Kexin Zhao

Natural Language Processing Group BEIJING JIAOTONG UNIVERSITY Beijing China

China
Author Profile
Jinan Xu

Natural Language Processing Group BEIJING JIAOTONG UNIVERSITY Beijing China

China
Author Profile
Jing Shi

Natural Language Processing Group BEIJING JIAOTONG UNIVERSITY Beijing China

China

📄 논문 정보

발행 연도 2025년
인용수 4
출판 국가 China
사이트 IEEE
좋아요 수 0

연관 논문 목록 (61건)