ITRT(IT Research Trends)

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

연구 분야: Artificial Intelligence

논문 키워드: #mongolian #1952 #86 #1947 #newspapers

학회: International Journal on Document Analysis and Recognition (IJDAR)

초록

OCR approaches have been widely advanced in recent years thanks to the resurgence of deep learning. However, to the best of our knowledge, there is little work on Mongolian movable-type document recognition. One major hurdle is the lack of a domain-specific well-labeled set for training robust models. This paper aims to create the first Mongolian movable type text-image dataset for OCR research. We collated 771 paragraph-level pages segmented from 34 newspapers from 1947 to 1952. For each page, word- and line-level text transcriptions and boundary annotations are recorded. It consists of 86,578 word appearances and 9711 text-line images in total. The vocabulary is 7964. The dataset was finally established from scratch through image collection, text transcription, text-image alignment and manual correction. Moreover, an official train and test set partition is defined on which the typical text segmentation and recognition experiments are tested to set the strong baselines. This dataset is available for research, and we encourage researchers to develop and test new methods using our dataset.

📄 논문 정보

발행 연도	2023년
인용수	2
출판 국가	China
사이트	Springer
좋아요 수	0

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

Min Lu

Feilong Bao

Hui Zhang

Guanglai Gao

📄 논문 정보

연관 논문 목록 (1건)

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

The image and ground truth dataset of Mongolian movable-type newspapers for text recognition

Min Lu

Feilong Bao

Hui Zhang

Guanglai Gao

📄 논문 정보

연관 논문 목록 (1건) 내 서재 담기

MNASR: A Free Speech Corpus For Mongolian Speech Recognition And Accompanied Baselines

연관 논문 목록 (1건)