Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
Disconnected Recurrent Neural Networks for Text Categorization
Published in ACL 2018, 2018
Recommended citation: Baoxin Wang. 2018. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018).
Download Paper
CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension
Published in CCL 2019, 2019
Recommended citation: Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, Heng Wang, Zhiyuan Liu. Chinese Computational Linguistics (CCL 2019).
Download Paper
IFlyLegal: a Chinese Legal System for Consultation, Law Searching, and Document Analysis
Published in EMNLP 2019, 2019
Recommended citation: Ziyue Wang, Baoxin Wang, Xingyi Duan, Dayong Wu, Shijin Wang, Guoping Hu, and Ting Liu. 2019. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP 2019).
Download Paper
Dynamic Connected Networks for Chinese Spelling Check
Published in Findings of ACL 2021, 2021
Chinese spelling check (CSC) is a task to detect and correct spelling errors in Chinese text. Most state-of-the-art works on the CSC task adopt a BERT-based non-autoregressive language model, which relies on the output independence assumption. The inappropriate independence assumption prevents BERT-based models from learning the dependencies among target tokens, resulting in an incoherent problem. To address the above issue, we propose a novel architecture named Dynamic Connected Networks (DCN), which generates the candidate Chinese characters via a Pinyin Enhanced Candidate Generator and then utilizes an attention-based network to model the dependencies between two adjacent Chinese characters. The experimental results show that our proposed method achieves a new state-of-the-art performance on three human-annotated datasets.
Recommended citation: Baoxin Wang, Wanxiang Che, Dayong Wu, Shijin Wang, Guoping Hu, and Ting Liu. 2021. In Findings of the Association for Computational Linguistics (ACL 2021).
Download Paper
CCTC: A Cross-Sentence Chinese Text Correction Dataset for Native Speakers
Published in COLING 2022, 2022
The Chinese text correction (CTC) focuses on detecting and correcting Chinese spelling errors and grammatical errors. Most existing datasets of Chinese spelling check (CSC) and Chinese grammatical error correction (GEC) are focused on a single sentence written by Chinese-as-a-second-language (CSL) learners. We find that errors caused by native speakers differ significantly from those produced by non-native speakers. These differences make it inappropriate to use the existing test sets directly to evaluate text correction systems for native speakers. Some errors also require the cross-sentence information to be identified and corrected. In this paper, we propose a cross-sentence Chinese text correction dataset for native speakers. Concretely, we manually annotated 1,500 texts written by native speakers. The dataset consists of 30,811 sentences and more than 1,000,000 Chinese characters. It contains four types of errors: spelling errors, redundant words, missing words, and word ordering errors. We also test some state-of-the-art models on the dataset. The experimental results show that even the model with the best performance is 20 points lower than humans, which indicates that there is still much room for improvement. We hope that the new dataset can fill the gap in cross-sentence text correction for native Chinese speakers.
Recommended citation: Baoxin Wang, Xingyi Duan, Dayong Wu, Wanxiang Che, Zhigang Chen, and Guoping Hu. 2022.In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022).
Download Paper
CINO: A Chinese Minority Pre-trained Language Model
Published in COLING 2022, 2022
Multilingual pre-trained language models have shown impressive performance on cross-lingual tasks. It greatly facilitates the applications of natural language processing on low-resource languages. However, there are still some languages that the current multilingual models do not perform well on. In this paper, we propose CINO (Chinese Minority Pre-trained Language Model), a multilingual pre-trained language model for Chinese minority languages. It covers Standard Chinese, Yue Chinese, and six other ethnic minority languages. To evaluate the cross-lingual ability of the multilingual model on ethnic minority languages, we collect documents from Wikipedia and news websites, and construct two text classification datasets, WCM (Wiki-Chinese-Minority) and CMNews (Chinese-Minority-News). We show that CINO notably outperforms the baselines on various classification tasks. The CINO model and the datasets are publicly available at http://cino.hfl-rc.com.
Recommended citation: Ziqing Yang, Zihang Xu, Yiming Cui, Baoxin Wang, Min Lin, Dayong Wu, and Zhigang Chen. 2022. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022).
Download Paper
TiBERT: A Non-autoregressive Pre-trained Model for Text Editing
Published in NLPCC 2023, 2023
Recommended citation: Baoxin Wang, Ziyue Wang, Wanxiang Che, Dayong Wu, Rui Zhang, Bo Wang, Shijin Wang. Natural Language Processing and Chinese Computing (NLPCC 2023).
Download Paper
LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction
Published in COLING 2024, 2024
Recommended citation: Yixuan Wang, Baoxin Wang, Yijun Liu, Dayong Wu, Wanxiang Che. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (COLING 2024).
Download Paper
Improving Grammatical Error Correction via Contextual Data Augmentation
Published in Findings of ACL 2024, 2024
Recommended citation: Yixuan Wang, Baoxin Wang, Yijun Liu, Qingfu Zhu, Dayong Wu, and Wanxiang Che. 2024. In Findings of the Association for Computational Linguistics: ACL 2024.
Download Paper
SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model
Published in EMNLP 2024 Demo, 2024
Recommended citation: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, and Guoping Hu. 2024. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2024).
Download Paper
Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization
Published in ACL 2025, 2025
Recommended citation: Lei Huang, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yangfan Ye, Weihong Zhong, Yuxuan Gu, Baoxin Wang, Dayong Wu, Guoping Hu, and Bing Qin. 2025. Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Download Paper
Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning
Published in ACL 2025, 2025
Recommended citation: Lei Huang, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yuxuan Gu, Yangfan Ye, Liang Zhao, Weihong Zhong, Baoxin Wang, Dayong Wu, Guoping Hu, Lingpeng Kong, Tong Xiao, Ting Liu, and Bing Qin. 2025. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Download Paper
Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
Published in ACL 2025, 2025
Recommended citation: Runxuan Liu, Luobei Luobei, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, and Bing Qin. 2025. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Download Paper
SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types
Published in Findings of ACL 2025, 2025
Recommended citation: Xuanliang Zhang, Dingzirui Wang, Baoxin Wang, Longxu Dou, Xinyuan Lu, Keyan Xu, Dayong Wu, and Qingfu Zhu. 2025. In Findings of the Association for Computational Linguistics: ACL 2025.
Download Paper
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.