Gender Bias and Under-Representation in Natural Language Processing Across Human Languages


연구 분야: Artificial Intelligence



학회: AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society


초록

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in which voices they represent. In this paper, a team including speakers of 9 languages - Chinese, Spanish, English, Arabic, German, French, Farsi, Urdu, and Wolof - reports and analyzes measurements of gender bias in the Wikipedia corpora for these 9 languages. In the process, we also document how our work exposes crucial gaps in the NLP-pipeline for many languages. Despite substantial investments in multilingual support, the modern NLP-pipeline still systematically and dramatically under-represents the majority of human voices in the NLP-guided decisions that are shaping our collective future. We develop extensions to profession-level and corpus-level gender bias metric calculations originally designed for English and apply them to 8 other languages, including languages like Spanish, Arabic, German, French and Urdu that have grammatically gendered nouns including different feminine, masculine and neuter profession words. We compare these gender bias measurements across the Wikipedia corpora in different languages as well as across some corpora of more traditional literature.


Author Profile
Yan Chen

Clarkson University Potsdam NY USA

United States
Author Profile
Christopher Mahoney

Clarkson University Potsdam NY USA

United States
Author Profile
Isabella Grasso

Clarkson University Potsdam NY USA

United States

📄 논문 정보

발행 연도 2021년
인용수 16
출판 국가 United States
사이트 ACM
좋아요 수 0

연관 논문 목록 (5건)