CTC 2021

A Chinese Text Correction Dataset for Native Speakers

What is CTC 2021?

CTC 2021 is a Chinese Text Correction Competition that aims to detect and correct errors in a given text. Participants should detect the mistaken texts and determine the corresponding error types, and correct them.

In the past decade, automatic Chinese text correction methods have been studied using texts written by non-native Chinese learners. However, most of the errors in these texts seldom appear in texts written by native speakers. Therefore, a proofreading system for native Chinese speakers is needed and would be practically helpful for official faculties such as governments, media and publishing industries. To this end, CTC 2021 collects texts written by native Chinese speakers from the Internet as the validation and testing data, and evaluates the performance of the trained models in terms of spelling errors, grammatical errors and faulty wording or formulation.

Getting Started

You may also be interested in a quick baseline system.

Download baseline model:

Resources for Chinese text correction (CTC):

Official Submission

To preserve the integrity of test results, we do not release the test and challenge set to the public. Instead, we require you to upload your model onto CodaLab so that we can run it on the test and challenge set for you. You can follow the instructions on CodaLab (which is similar to SQuAD submission). Submission Tutorial

Have Questions?

Ask us questions at our GitHub repository

Leaderboard

Evaluation Metrics: The detection score and correction score are considered together, specifically, the evaluation result = 0.8 * detection score + 0.2 * correction score, where both the detection score and correction score are calculated using the F-score.

Rank Team score
detect_f1 correct_f1 final_score

1

Sep 3, 2021
S&A

苏州大学; 阿里巴巴达摩院

68 64.6 67.32

2

Sep 3, 2021
改的都队

清华大学

62.405 57.205 61.365

3

Sep 3, 2021
znv_sentosa

深圳力维智联技术有限公司

55.035 43.055 52.639

4

Sep 3, 2021
C&L

北京理工大学

51.126 48.649 50.631

5

Sep 3, 2021
MDatai

上海蜜度信息技术有限公司-新浪微热点研究院

51.233 47.374 50.461

6

Sep 3, 2021
YCC

北京铀媒科技有限公司

49.804 42.745 48.392

7

Sep 3, 2021
NJU-NLP

南京大学自然语言处理实验室

49.02 39.651 47.146

8

Sep 3, 2021
四条人

蚂蚁金服

41.505 35.68 40.34

9

Sep 3, 2021
ai编程的小拓

拓尔思信息技术股份有限公司

38.372 31.628 37.023

10

Sep 3, 2021
zybank

中原银行

37.863 33.217 36.934

11

Sep 3, 2021
华夏—龙盈战队

华夏银行股份有限公司; 龙盈智达(北京)科技有限公司

28.646 21.875 27.292

12

Sep 3, 2021
yl_test

北京猿力未来科技有限公司

26.516 16.925 24.598

13

Sep 3, 2021
晓梦

人民网

20.997 14.173 19.632

14

Sep 3, 2021
only-one

北邮

20.709 14.468 19.461

15

Sep 3, 2021
zndx纠错好难

中南大学

17.714 9.714 16.114

16

Sep 3, 2021
DAWN

MideaAIIC

6.326 3.128 5.686