Capturing research literature attitude towards sustainable development goals: an LLM-based topic modeling approach


연구 분야: Software Development



학회: Journal of Big Data


초록

The world is facing a multitude of challenges that hinder the development of human civilization and the well-being of humanity on the planet. The Sustainable Development Goals (SDGs) were formulated by the United Nations in 2015 to address these global challenges by 2030. Natural language processing techniques can help uncover discussions on SDGs within research literature. We propose a completely automated pipeline that (1) fetches content from academic literature and prepares datasets dedicated to five groups of SDGs; (2) performs topic modeling, a statistical technique used to identify topics in large collections of textual data; and (3) enables topic exploration through keywords-based search and topic frequency time series extraction. For topic modeling, we leverage the stack of BERTopic scaled up to be applied on large corpora of textual documents (we find hundreds of topics on hundreds of thousands of documents), introducing (i) a novel LLM-based embeddings computation for representing scientific abstracts in the continuous space, and (ii) a hyperparameter optimizer to efficiently find the best configuration for any new dataset. We additionally produce the visualization of results on interactive dashboards reporting topics’ temporal evolution. Results are made inspectable and explorable, contributing to the interpretability of the topic modeling process. The proposed LLM-based topic modeling pipeline allows users to capture insights on the evolution of the attitude toward SDGs within scientific abstracts in the 2006–2023 time span. All the results are reproducible by using our system; the workflow can be generalized to be applied at any point in time to any large corpus of text data.


Author Profile
Francesco Invernici

Department of Electronics Information and Bioengineering Politecnico di Milano Via Ponzio 34/5 20133 Milano Italy

Andorra
Author Profile
Francesca Curati

Department of Electronics Information and Bioengineering Politecnico di Milano Via Ponzio 34/5 20133 Milano Italy

Andorra
Author Profile
Jelena Jakimov

Department of Electronics Information and Bioengineering Politecnico di Milano Via Ponzio 34/5 20133 Milano Italy

Andorra

📄 논문 정보

발행 연도 2025년
인용수 3
출판 국가 Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (40건)