Publications

You can also find my articles on my Google Scholar profile.

Conference Papers

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Published in Findings of ACL 2025, 2025

Recommended citation: Xuanliang Zhang, Dingzirui Wang, Baoxin Wang, Longxu Dou, Xinyuan Lu, Keyan Xu, Dayong Wu, and Qingfu Zhu. 2025. In Findings of the Association for Computational Linguistics: ACL 2025.
Download Paper

Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering

Published in ACL 2025, 2025

Recommended citation: Runxuan Liu, Luobei Luobei, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, and Bing Qin. 2025. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Download Paper

Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning

Published in ACL 2025, 2025

Recommended citation: Lei Huang, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yuxuan Gu, Yangfan Ye, Liang Zhao, Weihong Zhong, Baoxin Wang, Dayong Wu, Guoping Hu, Lingpeng Kong, Tong Xiao, Ting Liu, and Bing Qin. 2025. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Download Paper

Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization

Published in ACL 2025, 2025

Recommended citation: Lei Huang, Xiaocheng Feng, Weitao Ma, Yuchun Fan, Xiachong Feng, Yangfan Ye, Weihong Zhong, Yuxuan Gu, Baoxin Wang, Dayong Wu, Guoping Hu, and Bing Qin. 2025. Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025).
Download Paper

SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

Published in EMNLP 2024 Demo, 2024

Recommended citation: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, and Guoping Hu. 2024. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (EMNLP 2024).
Download Paper

Improving Grammatical Error Correction via Contextual Data Augmentation

Published in Findings of ACL 2024, 2024

Recommended citation: Yixuan Wang, Baoxin Wang, Yijun Liu, Qingfu Zhu, Dayong Wu, and Wanxiang Che. 2024. In Findings of the Association for Computational Linguistics: ACL 2024.
Download Paper

LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction

Published in COLING 2024, 2024

Recommended citation: Yixuan Wang, Baoxin Wang, Yijun Liu, Dayong Wu, Wanxiang Che. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (COLING 2024).
Download Paper

TiBERT: A Non-autoregressive Pre-trained Model for Text Editing

Published in NLPCC 2023, 2023

Recommended citation: Baoxin Wang, Ziyue Wang, Wanxiang Che, Dayong Wu, Rui Zhang, Bo Wang, Shijin Wang. Natural Language Processing and Chinese Computing (NLPCC 2023).
Download Paper

CINO: A Chinese Minority Pre-trained Language Model

Published in COLING 2022, 2022

Multilingual pre-trained language models have shown impressive performance on cross-lingual tasks. It greatly facilitates the applications of natural language processing on low-resource languages. However, there are still some languages that the current multilingual models do not perform well on. In this paper, we propose CINO (Chinese Minority Pre-trained Language Model), a multilingual pre-trained language model for Chinese minority languages. It covers Standard Chinese, Yue Chinese, and six other ethnic minority languages. To evaluate the cross-lingual ability of the multilingual model on ethnic minority languages, we collect documents from Wikipedia and news websites, and construct two text classification datasets, WCM (Wiki-Chinese-Minority) and CMNews (Chinese-Minority-News). We show that CINO notably outperforms the baselines on various classification tasks. The CINO model and the datasets are publicly available at http://cino.hfl-rc.com.

Recommended citation: Ziqing Yang, Zihang Xu, Yiming Cui, Baoxin Wang, Min Lin, Dayong Wu, and Zhigang Chen. 2022. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022).
Download Paper

CCTC: A Cross-Sentence Chinese Text Correction Dataset for Native Speakers

Published in COLING 2022, 2022

The Chinese text correction (CTC) focuses on detecting and correcting Chinese spelling errors and grammatical errors. Most existing datasets of Chinese spelling check (CSC) and Chinese grammatical error correction (GEC) are focused on a single sentence written by Chinese-as-a-second-language (CSL) learners. We find that errors caused by native speakers differ significantly from those produced by non-native speakers. These differences make it inappropriate to use the existing test sets directly to evaluate text correction systems for native speakers. Some errors also require the cross-sentence information to be identified and corrected. In this paper, we propose a cross-sentence Chinese text correction dataset for native speakers. Concretely, we manually annotated 1,500 texts written by native speakers. The dataset consists of 30,811 sentences and more than 1,000,000 Chinese characters. It contains four types of errors: spelling errors, redundant words, missing words, and word ordering errors. We also test some state-of-the-art models on the dataset. The experimental results show that even the model with the best performance is 20 points lower than humans, which indicates that there is still much room for improvement. We hope that the new dataset can fill the gap in cross-sentence text correction for native Chinese speakers.

Recommended citation: Baoxin Wang, Xingyi Duan, Dayong Wu, Wanxiang Che, Zhigang Chen, and Guoping Hu. 2022.In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022).
Download Paper

Dynamic Connected Networks for Chinese Spelling Check

Published in Findings of ACL 2021, 2021

Chinese spelling check (CSC) is a task to detect and correct spelling errors in Chinese text. Most state-of-the-art works on the CSC task adopt a BERT-based non-autoregressive language model, which relies on the output independence assumption. The inappropriate independence assumption prevents BERT-based models from learning the dependencies among target tokens, resulting in an incoherent problem. To address the above issue, we propose a novel architecture named Dynamic Connected Networks (DCN), which generates the candidate Chinese characters via a Pinyin Enhanced Candidate Generator and then utilizes an attention-based network to model the dependencies between two adjacent Chinese characters. The experimental results show that our proposed method achieves a new state-of-the-art performance on three human-annotated datasets.

Recommended citation: Baoxin Wang, Wanxiang Che, Dayong Wu, Shijin Wang, Guoping Hu, and Ting Liu. 2021. In Findings of the Association for Computational Linguistics (ACL 2021).
Download Paper

IFlyLegal: a Chinese Legal System for Consultation, Law Searching, and Document Analysis

Published in EMNLP 2019, 2019

Recommended citation: Ziyue Wang, Baoxin Wang, Xingyi Duan, Dayong Wu, Shijin Wang, Guoping Hu, and Ting Liu. 2019. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing: System Demonstrations (EMNLP 2019).
Download Paper

CJRC: A Reliable Human-Annotated Benchmark DataSet for Chinese Judicial Reading Comprehension

Published in CCL 2019, 2019

Recommended citation: Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, Heng Wang, Zhiyuan Liu. Chinese Computational Linguistics (CCL 2019).
Download Paper

Disconnected Recurrent Neural Networks for Text Categorization

Published in ACL 2018, 2018

Recommended citation: Baoxin Wang. 2018. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018).
Download Paper