Skip to content

Nexdata-AI/4720000-Groups-Chinese-Uighur-Parallel-Corpus-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

4720000-Groups-Chinese-Uighur-Parallel-Corpus-Data

Description

4,720,000 sets of Chinese and Uighur language parallel translation corpus, data storage format is txt document. Data cleaning, desensitization, and quality inspection have been carried out, which can be used as a basic corpus for text data analysis and in fields such as machine translation. For more details, please refer to the link: https://www.nexdata.ai/datasets/nlu/1185?source=Github

Storage format

TXT

Data content

Chinese-Uighur Parallel Corpus Data

Data size

4.72 million pairs of Chinese-Uighur Parallel Corpus Data. The Chinese sentences contain 22 characters on average

Language

Chinese, Uighur

Application scenario

machine translation

Accuracy rate

90%

Licensing Information

Commercial License